





WAY ° 


io 9 PSYCHOLOGICAL REVIEW PUBLICATIONS APRIL, 1927 


Journal of 


RDIT&ED BY 
MADISON BENTLEY, UNtversity oF ILLINoIs 


HOWA RD C. WARREN, PRINCETON UNIVERSITY ( Review) 


JOHN B. WATSON, New York City (Review) 
RAYMOND DODGE, Yate UNtversity (Monographs) 


Experimental Psychology 


S. W. FERNBERGER, UNIVERSITY OF PENNSYLVANIA (Bulletin) 
WALTER S. HUNTER, CLarkx University (/ndex) 


HERBERT S. LANGFELD, Princeton University, Business Editor 





CONTENTS 


tion of the Reliability of Mental Tests and Tests of Spectal 
Abtlittes: EAU Ti. LAMUBR, «000 i ccscce: 


onvenient Mirror-Drawing Device: 


HARVEY C. LEHMAN AND PAUL A. WITTY 


¢ Relation Between Physique and Performance : 


GEORGE J. MoOHR AND RALPH H. GUNDLACH 


Analysis of Eye Movements in the Reading of Chinese: 


EUGENE SHEN 


lurther Contribution to the Tactual Perception of Form: 


M. J. ZIGLER AND REBECCA BARRET1 


e Mirror Tachistoscope in the Drill Laboratory: GLENN D. HIGGINSON 
e Fixational Pause of the Eyes: P. W. Coss AnpD F. K. Moss 





PUBLISHED BI-MONTHLY 
FOR THE AMERICAN PSYCHOLOGICAL ASSOCIATION 


BY THE PSYCHOLOGICAL REVIEW COMPANY 
PRINCE anp LEMON STS., LANCASTER, PA. 
and PRINCETON, N. J. 


Entered as second-class matter, July 1, 1920, at the post-office at Lancaster, Pa 


69 








Psychological Review Publications 
of the American Psychological Association 
EDITED BY 


HOWARD C. WARREN, PRINCETON UNIVERSITY (Review) 
JOHN B. WATSON, 244 Maptson Ave., New York ( Review) 
RAYMOND DODGE, YALE UNIvErsity (Monographs) 
MADISON BENTLEY, University or ILuinors (J. of Exp. Psych ) 
S. W. FERNBERGER, UNIVERSITY OF PENNSYLVANIA (Bullet: ) 
WALTER S. HUNTER, CLARK UNIvERsIty (Index) 


HERBERT S. LANGFELD, Princeton University, Business Editor 


WITH THE CO-OPERATION OF 
Many DISTINGUISHED PSYCHOLOGISTS 


PSYCHOLOGICAL REVIEW 


eontaining original contributions only, appears bimonthly, January, March, 
May, July, September, and November, the six numbers comprising a volume 
of about 480 pages. 


PSYCHOLOGICAL BULLETIN 


containing critical reviews of books and articles, psychological news and notes, 
university notices, and announcements, appears monthly, the annual volume 
comprising about 720 pages. Special issues of the BULLETIN consist of general 
reviews of recent work in some department of psychology. 


JOURNAL OF EXPERIMENTAL PSYCHOLOGY 


containing original contributions of an experimental character, appears bi- 
monthly, February, April, } une, August, October, and December, the six num- 
bers comprising a volume of about 480 pages. 


PSYCHOLOGICAL INDEX 


is a compendious bibliography of books, monographs, and articles upon psycho- 
logical and cognate topics that have appeared during the year. The INDEX is 
issued annually in May, and may be subscribed for in connection with the 
periodicals above, or purchased separately. 


PSYCHOLOGICAL MONOGRAPHS 


consist of longer researches or treatises or collections of laboratory studies 
which it is important to publish eae and as units. The price of single 
numbers varies according to their size. The MONOGRAPHS appear at inveguler 
intervals and are gathered into volumes of about 500 pages. 


ANNUAL SUBSCRIPTION RATES 


Review: $5.00 (Foreign, $5.25). Review and Bulletin: $9.50 (Foreign, $10.00) 
Journal: $5.00 (Foreign, $5.25). Review and Journal: $9.00 (Foreign, $9.50) 
Bulletin: $5.50 (Foreign, $5.75). Journal and Bulletin: $9.50 (Foreign, $10.00) 
An a of — Mn poe $1.50 oy tm $ ' 
eview, etin, and Journal: $14.00 (Foreign, $14.75). 
Review, Bulletin, Journal, and Index: $15.00 (Foreign, $15.75). 
Current Numbers: Review or Journal, $1.00; Bulletin, 60c; Index, $2.00 
Psychological Monographs: $6.00 per volume (Foreign, $6.30). 
Current Issues: prices vary g to size. 





Subscriptions, orders, and business communications may be sent direct to the 
PRINCETON, N. J. OFFICE 


PSYCHOLOGICAL REVIEW COMPANY 


FoariGcn Aasnts: G. E. STECHERT & CO., London (2 Star Yard, Cary St. 
W.C.), Paris (16, rue de Condé) ExF 














Journal of 


Experimental Psychology 








| 
| 
il 


Vou. X., No. 2 — APRIL, 1927 


PREDICTION OF THE RELIABILITY OF MENTAL 
TESTS AND TESTS OF SPECIAL ABILITIES 


BY LYLE H. LANIER! 


New York University 


The Spearman prophecy-formula is now widely used to 
predict from a single performance the reliability of any given 
test. As arule the test is divided into equal halves; and the 
coefficient given when the two halves are correlated is substi- 
tuted in the formula to predict the reliability for any desired 
length of the test. The formula is written as follows: 

NT ss 


Tr = (T) 
i 1+(n — I)rzz’ 





where frpy is the reliability coefficient to be determined, r,, 
the correlation between the two parts, and n the number of 
times the length of the test is to be increased (or the multiple 
of increase in the number of subjects). This formula furnishes 
a very convenient method for estimating the reliability of 
tests whose reliability is otherwise difficult to determine; 
e.g., Wood’s use of the method to determine the reliability 


1 Materials for the present study were collected while the writer was a research 
assistant for the Committee on the Scientific Problems of Human Migration, National 
Research Council, under the direction of Dr. Joseph Peterson. The writer is deeply 
indebted to Dr. Peterson for his very kind assistance and encouragement at all times 
and to the Committee for financial assistance in carrying out an extensive study of 
racial traits. The present study is based upon data taken from the general racial study. 
The writer is grateful to Dr. S. C. Garrison of George Peabody College and to President 
P. A. Lyon of the Middle Tennessee State Teachers College for cooperation in the 
matter of securing subjects for the tests, and to his brother, A. C. Lanier, Jr., for 
valuable aid in the calculation of the correlation coefficients. 


6 69 

















70 LYLE H. LANIER 


of the examination in algebra and geometry of the College 
Entrance Examination Board (17). But at the same time 
the formula involves quite definite mathematical assumptions 
and should be subjected to experimental study before it is 
released upon an unsuspecting and statistically untrained 
group of practical educators and psychologists. If the pre- 
dictions made when using the formula have a fair degree of 
accuracy, then it is a valuable instrument in psychological 
and educational research; but if its predicted reliabilities 
have no validity, the wide use which it seems about to 
enjoy will lead to fallacious and misleading conclusions. The 
present study is an attempt experimentally to test the 
predictions made with the formula and to analyze the varia- 
tions found in applying it to various kinds of material. 

First, in order to afford a more comprehensive view of 
the problem to persons not well acquainted with its history 
and its implications, the various concepts of reliability will 
be briefly summarized, the development of the prediction 
formula treated somewhat in detail, and the few experimental 
studies of its validity and applicability considered. 


- HisToRICAL 
A. The Concept of Reliability—It may be well here to 


emphasize the distinction between the validity and the relia- 
bility of a test. Validity is determined by the extent to 
which it really measures what it purports to measure, while 
its reliability is a measure of the consistency of the results 
obtained with it. Let us suppose that we attempt to measure 
the intelligence of adult individuals by finding their height in 
inches. Obviously this is not a true, that is to say, a valid, 
measure of the intelligence of the individuals studied; but it 
is nevertheless a reliable method of measurement in the sense 
that the measurements of height are consistent in repeated 
experiments. An example of a highly valid measurement is 
that of the height of a wall determined by means of a foot- 
rule. This is a ‘true’ measure in that it represents accurately 
a specific property of the object measured. At the same 
time it is a highly reliable measure, as are all measures of 








RELIABILITY OF MENTAL TESTS 71 


high validity. We just saw, however, that the contrary is 
not true. It is true that all highly reliable measures are 
valid measures of something, but they are not necessarily 
good measures of the trait to be measured. The ‘height in 
inches’ is a true representation of the amount of the distance 
between an individual’s feet and the top of his head, and it is, 
in this respect, valid; but it is no indication of his intelligence. 
Thus an intelligence test may be reliable but not valid. 


Spearman, in 1904, introduced the reliability coefficient as a part of his formula 
for the elimination of the effects of observational errors (commonly called “correction 
for attenuation”’). He defined it as “the average correlation between one and another 
of the severally independently obtained series of values” for any given ability or 
function (13). The reliability coefficient meant for him the correlation between 
comparable tests of the same function, and not that between repeated tests. Brown 
(1910) used the term to refer to the extent to which “the amalgamated series of results 
of the two tests would correlate with a similar amalgamated series of two other applica- 
tions of the same test (1). As Kelley points out, such a correlation between repeated 
applications of the same test would lead to too high an estimate of its reliability where 
there is a possibility of direct memory transference (7). He says: ‘‘We may call 
such a situation one in which there is a correlation between errors, meaning that 
whatever elements of uncertainty or chance operated in the solution of the first question, 
they would tend to operate in the same manner in the solution of the second” (p. 203). 
In the 1921 edition of The Essentials of Mental Measurement (Brown and Thomson), 
the term ‘reliability coefficient’ is used to refer to the correlation between “closely 
similar” tests (2, p. 155). 

Recently Kelley has taken exception to an interpretation of reliability given by 
Crum (3), whose statement he interprets to mean that the reliability coefficient refers 
to the correlation between two tests that differ only in ‘chance’ factors (8). Such a 
condition would result, it is said, if the same test were repeated and the correlation 
between the two calculated. Kelley objects that this will not give the true reliability 
coefhcient, but that such a coefficient will be given when two tests are correlated which 
differ not only in the chance factors but in certain other ‘unique’ factors and which 
possess in common a factor that does not belong to the ‘field’ of observation. It is 
difhcult to see how Kelley includes this latter factor in his tests if he aims at a true 
reliability coefficient. It may be impossible in practice to devise tests that include 
no factors common to each other and yet not to the field in general, t.¢., duplicate tests 
that are perfectly valid measures of the given ability or complex of abilities; but this 
might at least be an ideal aim. 


Crum contends that, if a test can be devised whose halves 
have a correlation, it should be equally easy to construct a 
second test that will have just as high reliability (3, p. 300). 
Hence he concludes that the practical problem of estimating 
reliability is of relatively little moment in the case of a test 
designed to measure a single capacity. The first part of his 











72 LYLE H. LANIER 


statement may be true, but his conclusion by no means 
follows. Any person familiar with mental testing knows that 
the situation is not so simple as this. Changes in attitude 
and circumstances that are quite beyond the experimenter’s 
control (often even beyond his apprehension) operate to 
influence the results and hence to affect their reliability. It 
is a problem to determine in the case of a particular kind of 
test to what exact extent a similar test can be expected to 
give the same results. The old faculty psychology underlies 
-such a conception as Crum’s. He conceives the individual as 
constituted of a multiplicity of functions that are more or 
less constant and distinct in their operation. Hence if we 
restrict our range of testing to one of these functions it should 
always give the same result and to go about measuring its 
reliability is an entirely gratuitous task. The experimental 
results of the present paper lend no support to such a view. 
It will presently be shown that when we restrict our range of 
measurement to such a comparatively simple performance as 
speed in making dots or ability to discriminate the pitch of 
two tones the problem of determining the reliability of such 
measurement is no less real and difficult. It may develop 
that the Spearman prophecy-formula is inadequate to the 
task; but our criticism will come from an empirical, not from 
an a priori, standpoint, and not from an attempt to dispose 
of the question by denying that it exists. 

B. The Development of the Prediction Formula.—The 
formula for the elimination of observational errors, which 
included the reliability coefficient for the first time, as we 
mentioned above, was written by Spearman in 1904 as 
follows: 


P Tpiql (II) 


f >] 
NT pp’ ° Ta'q’ 





lpg 





where 


r»,q, = “the mean of the correlations of each series of 
values obtained for p with each series ob- 
tained for q.” 





RELIABILITY OF MENTAL TESTS 73 
Tp’p’ = “the average correlation between one and 
another of the several independently ob- 
tained values for p.” 
Tq’q’ = “the same as regards gq.” 
Toq = “the required real correlation between p and 
q.” 

In presenting this formula, Spearman admitted that it was 
based upon the following assumptions; (1) that the errors in 
the several series are uncorrelated, and (2) that neither of 
the actual measurements be connected with one of the other 
series quite independently of p (the function being measured). 

This formula was presented by Spearman in 1904 without 
proof. In 1907 he gave the algebraic proof both of the 
formula for the elimination of observational errors and of a 
formula for the elimination of irrelevant factors (14) (called 
by Yule, who derived it differently, the ‘partial correlation’ 
method). ‘The demonstration, which is rather lengthy, is not 
included here since it does not relate directly to our problem. 

Spearman in Ig!10 revised his formula for correction for 
attenuation and in this new version included the formula for 
the estimation of reliability. His revised formula was written 
as follows (15): 





ray = Tay? See a, 


PTr2, q*Ty.y, 





It seems that the only difference between the new formula and 
the one of 1907 is that the method of determining the relia- 
bility is improved. The expression under the radical in 
formula (III), which may be regarded as the denominator of a 
fraction whose numerator is rz, is at once recognized as the 
prediction formula given in our Introduction and applied to 
the factors p and g for the purpose of obtaining a true estimate 
of their reliability. The basis of the formula for the correction 
for attenuation is that the size of the accidental errors can be 
measured by the size of the discrepancies between successive 
measurements of the same thing. If, then, we secure an 
approximately true measure of the correlated members we 
can estimate their true relationship. While this procedure is 











74 LYLE H. LANIER 


theoretically sound, the assumptions which underlie it should 
be borne in mind and the all too prevalent practice of correct- 
ing for attenuation a mere chance coefficient of correlation 
should be discouraged. 

Brown’s modification of the prediction formula (which js 
popularly called ‘‘Brown’s formula”’) appeared in IgI!o, and 
does not differ from it in essentials. Brown simply wrote the 
formula for the special case where n (the number of parts 
to be summed or the multiple of increase in the number of 
subjects) equalled 2 (1). His proof of this special case is given 
in our Appendix A. In Appendix B a proof of the formula for 
any number of cumulated parts is presented. This proof is 
based upon Spearman’s formula for the correlation of sums 
(16), as derived by Kelley (Statistical Method pp. 196-197). 
The standard scores are not used in this proof, however, and 
the notation is somewhat changed in the interest of simplicity. 

C. Experimental Studies of the Validity of the Prophecy- 
Formula.—The first study of the accuracy of the prophecy- 
formula seems to have been made by Holzinger in 1923 (5). 
For this purpose Holzinger used the Terman group test of 
mental ability, a test composed of ten parts or sub-tests. There 
are two forms of this test, Forms 4 and B. 


Holzinger gave both forms to 135 subjects on successive days. He then correlated 
each of the ten parts in Form A with the corresponding parts of Form 8B, in order to 
secure reliability coefficients for the several parts. These coefficients are shown in 


Table I. 


TaBLeE I 
DATA ON THE RELIABILITY OF THE TERMAN Group TEST AS GIVEN BY HOLzINGcER 
Number of Reliability Actual reliability Theoretical 
components of each part I to 10 Io tol reliability 
Dicaketsenensa denn .638 64 -70 68 
Ri iterabetendseuas .809 81 -79 82 
error er errr re. .082 87 83 87 
MET eT Tee ee CTT STS -JOo QI 87 gO 
| Perrre Ter Trey T 852 .gO 84 92 
Re rE ee .482 .88 .86 93 
Dis ccnscesenenseue .683 89 87 94 
Ditaceessessuneues 530 87 87 94 
cteshackenena din 514 .gI .JO 95 
Pree Terre Terr .702 92 92 .96 





RELIABILITY OF MENTAL TESTS 75 
Holzinger then substituted the mean of these ten correlations of the parts in Spearman's 
formula to predict the reliability of tests 2, 3, 4, . . . 10 times as long as one of the 
parts. To obtain the actual reliabilities for comparison with these theoretical coeffi- 
cients, he took the sums of two, three, four, etc. parts in Form 4 and correlated them 
respectively with the sums of two, three, four, etc. parts in Form B. Two series of 
actual reliabilities were secured by cumulating (a) the parts beginning with 1, and 
(b) those beginning with 10. The theoretical and the actual reliabilities thus secured 
are shown also in our table. The first line of the table is read thus: the number 1 
refers to the first sub-test in Terman’s group-test; the coefficient .638 is the correlation 
between sub-test 1 in Form 4 and sub-test 1 in Form B; the coefficient .64 is the same 
correlation (.638); the coefficient .70 is the correlation of sub-test 10 in Form 4 with 
the corresponding part in Form B; and the correlation .68 is the mean of the ten 
correlations of the separate parts and is the coefficient substituted in Spearman's 
formula to give the theoretical reliabilities beneath it in the last column of the table. 

To quote the author: “The general result appears to be that the reliability in- 
creases very rapidly when the first four or five tests are pooled, but increases thereafter 
more slowly than the prediction formula would lead us to expect. Moreover, the 
trend seems to be determined chiefly by the number of tests cumulated and is not 
affected appreciably by highly reliable tests pooled late in the series”’ (5, p. 305.) 

In a second study (1925) Holzinger and Clayton applied the Spearman prediction- 
formula to two types of material, the Otis Self-administering test of mental ability 
(heterogeneous material) and a test taken from the Buckingham revision of Ayres 
spelling scale (homogeneous material) (6). 

Forms 4 and B of the Otis test were given on successive days to college students 
(final results based upon 75 cases). Equal time-units of one and one-half minutes 
were secured by having the subjects plainly mark just beneath the problem on which 
they were working when the experimenter called ‘time.’ Ten equal time-groups were 
thus made of each Form of the test. The ten components in each Form were found, 
however, to be of very unequal difficulty and reliability. The reliabilities of these 
parts are shown in Table II. The actual reliabilities for increasing the number of 
cumulated parts were secured by the method described above for the Terman test. 


TABLE II 


HoLzINGER AND CLAYTON’S RESULTS FROM THE Ot1s SELF-ADMINISTERING TEST 


Number Reliability Actual Theoretical reliabilities 
of com- of the sepa- relia- Initial Mean of “Best fit” 
ponent rate parts bility r 10 r’s value 
Diacevtsenewes .495 495 495 15g .340 
Discsocduvacns Ce 579 663 .274 .§07 
eee ere .216 .604 -747 362 .607 
Riniaeeseeveos .002 .649 -797 432 O74 
DL cteneeeaenee: ae 739 831 486 721 
Divsadensce cau: ee -749 855 532 .760 
Docccssveceses — .004 752 573 -570 .783 
Wexeakensene we -404 .776 886 603 806 
, OTTER TTT Tee -143 -790 .598 631 322 
WRcccssdcacecc — .0go 828 Goo 655 838 











76 LYLE H. LANIER 


Three series of reliabilities were calculated by substituting successively in the formula 
(a) the initial correlation (the correlation between parts 14 and 1B), () the mean of 
the ten correlations of corresponding parts in the two Forms, and (c) a ‘best’ value 
derived from the actual reliability coefficients themselves.? 

The authors did not give the theoretical reliabilities, but I have calculated and 
included them in Table II, which contains also the reliabilities of the separate parts 
and the actual reliabilities of the cumulated parts. 

The results show that the initial coefficient, when substituted in the formula, 
regularly gives reliabilities that are too high, while the mean correlation gives predicted 
values that are too low. The predictions based upon the ‘best fit’ value conform 
fairly closely to the actual reliabilities. This is to be expected, however, since the 
best value was derived from the actual reliabilities themselves. This ‘best fit’ method 
is of no value in the actual prediction of reliability, as the authors point out, since it is 
entirely gratuitous to go about predicting a value that has already been found out by 
experiment. The writers use the method merely to show that there is an increase in 
reliability with an increase in the length of a test and that the formula will estimate 
this reliability fairly accurately, provided that a suitable coefficient be chosen for 
substitution in it. 

From the second type of material used in the study the investigators obtained 
much better results with the formula. Two tests of 105 words each (seven cycles of 
fifteen words to the cycle, all the cycles being constructed so as to be of equal difficulty) 
were given to 125 high school students on successive days. The methods used in 
treating these data were the same as those employed for the Otis test. The results 
are shown in Table III. 

The cumulative coefficients show a close conformity to those theoretically derived. 
The authors plot the results and show that in no case is the deviation from the actual 
coefficient as much as one standard deviation. 





2 Ibid., p. 295. . . . Inasmuch as this ‘best fit’ method is used in the present 
study, the explanation of it is here given. The aim of the authors was to secure a 
simple linear expression of the relationship between the increase in the length of the 
test and the corresponding predicted reliability (with such reliability coefficient based 
upon a value derived from all of the actual reliability coefficients). ‘“‘This may be 
accomplished by a transformation of equation (1) [the prediction formula, see supra, 
pet]. If we set a equal to comet [the constants in the prediction formula], then there 


Trz 
results at once, 


Z=—=at+n. 
Tnn 
This last expression is at once recognized as an hyperbolic curve in rz, and n, or a 
straight line with a slope of + 1.0 in the Z—n plane.” By dividing n by ran the 
value of Z for each of the cumulations of parts was obtained. The value of a for 
each of the ten values of Z was found and the average of the ten was taken as the 
‘best’ value for a. By substituting this average value of a in the equation 


’ 
rx 


the value of rz; was obtained. This is the ‘best’ value that was substituted in the 
prophecy formula to predict the reliabilities for increased lengths of the test. 





RELIABILITY OF MENTAL TESTS 77 


TaBL_eE III 


HoLzinGER AND CLAYTON'S RESULTS FROM THE BuckINGHAM-AyRES 
SPELLING MATERIAL 


Reliability Cumulative Theoretical 
Cycles of each reliability reliabilities 
cycle coefiicients initial r 

Deceees wen dalens saa 743 743 + .040 
Dicencsdhsauseeaeees -737 S41 853 + .026 
ee eee eer .788 .906 897 + .O15 
Reasidesvedeébaxeens 747 .g16 O20 + .O15 

| ROP PSE pee ee 816 O41 936 + .013 

Does deer oak tanecen .778 .949 945 + .O11 

| Ee TE rr Sere 801 955 .953 + .009 


In general they conclude “that where the ‘best values’ of r,, are used instead of 
initial or average values there is good agreement between the experimental and the 
theoretical results. This indicates that the Spearman law is behind the trend of the 
data, even though for purposes of prediction it cannot be applied in such cases” 
(6, p. 299). A further inference is that where the material is sufficiently homogeneous 
and of equal difficulty one can predict fairly well with the formula the reliability to be 
expected, regardless of the value substituted in the formula. We shall see presently 
that this is not generally true for all types of material. 

A third study of the validity of the predictions by means of the formula was made 
by Kelley where he compares the experimental results of Gordon in the field of lifted 
weights (on increasing the number of judges) with the reliabilities predicted from the 
formula (9). 

Miss Gordon found that the average correlation of the arrangements of the 
weights by single judges with the true order of the weights was .41; that the order 
given by averaging the results of five judges correlated with the true order to the 
extent of .68; that for ten judges the correlation was .79; for twenty judges, .86; 
and for fifty judges, .94 (4). These correlation coefficients are the Rho values. The 
results are summarized in Table IV, and the r equivalents of the Rho values are 
given. 


TaBLe IV 


SHOWING THE CORRELATION OF ARRANGEMENT OF WEIGHTS WITH THE TRUE ORDER 
OF THE WEIGHTS 


Number 
of Rho r 
judges 
Del ee Se aie het — 421 
POT T TT TCIM TET .. .680 .704 
PT ST eer err rere TT ere eae ee 1s wee 803 
EE ii ep tek hte ek ik Be ce kee a 66 Oak .. 860 875 
.\ PPT rrrrrr reer rr err ree ee ee ee -. «940 949 


Kelley figured the average reliability coefficients for 1, 5, 10, 20, and 50 judges 


respectively from the formula, given in his Statistical Method (9, p. 206), for the corre- 











78 LYLE H. LANIER 


lation between one form of a test and a true score of the function measured by the test. 
This formula is written 


fie = Writ. 
With the experimental reliabilities determined by this method, he compared the 
theoretical reliabilities secured by substituting in the prediction formula the value 


.177, the average reliability coefficient for the rankings of single judges, letting n 
equal 5, 10, 20 and 50. The actual and the theoretical values are shown in Table V. 


TABLE V 


SHOWING THE CORRELATIONS OF ARRANGEMENTS OF WEIGHTS WITH THE TRUE ORDER 
OF THE WEIGHTS 


Number Theoretical Actual 
of judges reliability reliability 
FELT T OCCT CEE ETE TTT ETT CC TeTT 518 493 
Ee ee ee ee re eee Pe ee 683 645 
es £655 SANE ARES EDR ARREST ERED 812 .766 
a nctenk ethene aeewkeewehs eee O15 gO! 


There is, then, moderately close agreement between the increase in reliability by 
actually increasing the number of judges and that predicted by the formula. The 
data used in the present study were treated in this same manner and no such con- 
formity was found. One cannot conclude, then, that there is, for all kinds of material, 
a close relationship between the number of subjects and the amount of reliability to 
be expected. 


The three studies reported include, as it appears, all the 
experimental work on the validity of the predictions made by 
the use of the Spearman formula. The results are fairly 
meagre, in that a narrow range of material constitutes their 
basis. It is hoped that the present study, in which several 
tests are used, including tests of diverse sorts of ability and 
differing degrees of heterogeneity, will throw additional light 
upon the question.’ 


? Two additional studies along the line of the present study have been published 
since these experiments were recorded. In the first one Wood (18) applied the Spear- 
man formula to five achievement tests (four true-false tests in French, Pleading and 
Practice, Property and Equity, and a recognition test in French Vocabulary). Wood 
found rather good agreement between the predicted and the actual reliabilities where 
the conditions of equality of standard deviations and means were fulfilled. His 
predictions and actual reliabilities were all figured by correlating parts and cumulations 
of a single test, but not between repeated tests. Due to this fact, his results will 
hardly be comparable with the main body of our own. It will be seen below that 
agreement between the theoretical and the actual reliabilities is much closer where 
parts of one test, and not repeated tests, are correlated. A second study by Ruch, 
Ackerman and Jackson (10) is based upon cycles of spelling material, analogous to 





RELIABILITY OF MENTAL TESTS 79 


EXPERIMENTAL 


The tests described below include three types of material; 
(1) a general intelligence test, (2) tests of musical abilities and 
(3) tests of what are called elementary mechanical abilities. 
Altogether there are twelve separate tests. With such a 
diversity of material and with so many tests, it is hoped that 
considerable headway may be made towards determining just 
what can be expected of the prophecy-formula in the estima- 
tion of the reliability of tests. 

Although we accept Spearman’s view of what constitutes 
the reliability coefficient, 1.¢. that it is the correlation between 
scores on two comparable tests of a given function, we have, 
nevertheless, repeated ten of the twelve tests here used 
instead of giving different but comparable tests. This has 
been done because the nature of all except two of the tests 
is such, we believe, as to obviate the danger of direct memory 
transference from one application to the second. In the case 
of the Otis self-administering test and the substitution test, 
where there was a probability of direct transfer effects, a 
different form was used at the second testing. 


1. The Otis Self-Administering Test—Form A of this test was given to 107 college 
students in classes in psychology at George Peabody College on Nov. yoth, 1925. 
Two weeks later Form B was given. Both tests were given by me and the effort was 
made to have conditions as uniform and auspicious as possible.‘ 





the study by Holzinger and Clayton reported above. There were twenty cycles of 
twenty-five words each taken from Ashbaugh’s Iowa spelling scales. The words were 
selected so that cycles of approximately equal difficulty were obtained. These lists 
were given as ‘spelling exercises’ to five-hundred grammar grade pupils. The writers 
report a high degree of correspondence between the actual and the predicted reliabilities. 
This agrees, it will be recalled, with the results secured by Holzinger and Clayton. 
The spelling material uniformly yields high correlations between the separate parts, 
and the amount of increase predicted by the formula is relatively less than is the case 
when the prediction is based upon a low correlation. 

‘ This investigation was carried out subsidiary to a study of race differences which 
has been pursued during the past two years by Dr. Joseph Peterson and the writer. 
This study was made possible through grants by the Committee on the Scientific 
Problems of Human Migration of The National Research Council. The music tests 
were all given by Dr. Peterson, assisted by the writer, while the latter gave the re- 
mainder of the tests described here. Originally, it was planned to duplicate this study 
of reliability with negro subjects; but the plan had to be abandoned because of the 
inability to secure the subjects for the repetition of the tests. 











80 LYLE H. LANIER 


The test was not divided into as many parts as in the experiment by Holzinger 
and Clayton, partly because it did not seem necessary to duplicate their work and 
partly because the object in this study has been as much to test out the possibilities of 
predicting from one application of a test the reliability to be expected from it as merely 
to determine the amount of reliability to be expected with an increase of its length. 
This point will be discussed more fully in the outline of the methods used. Thirty 
minutes were allowed for the test and the subjects were required to mark the problem 
on which they were working, at intervals of ten minutes. Only the first two ten-minute 
periods were used, however, since many of the subjects finished the test before the 
end of the third period. These two ten-minute periods were further subdivided by 
summing in each one the scores on the odd and the even questions respectively. 

2. The Seashore Musical Abilities Tests —There are six of these tests, including 
tests of pitch, intensity, time, consonance, tonal memory and rhythm. On the first 
three, each subject recorded 100 judgments on pairs of sounds played by a portable 
victrola. On the last three, 50 judgments each were recorded. The judgments were 
recorded on standard mimeographed blanks, each containing 100 squares (ten columns 
and ten rows) for the first three, and 50 squares (ten rows and five columns) for the 
last three tests. Reference to the manual of directions for the tests will show the 
scheme (11). 

The score on each test is the per cent. of correct judgments. In the present 
study the score for each of the ten rows was obtained, thus subdividing each test into 
ten parts that should be approximately equal in difficulty. The table of means given 
below will show the relative difficulties of the parts within each test. 

The first three tests were given to 106 college students on October 29th, 1924, 
and were repeated on March roth, 1925. ‘The tests for consonance, tonal memory and 
rhythm were given first on October 30th, 1924, and repeated March 25th, 1925. One 
hundred and nine subjects took the last three tests. All tests were given in the chapel 
at the Middle Tennessee State Teachers College of Murfresboro, Tennessee; students 
of all college classes were included in the groups. 

Before the record for a given test was played the experimenter read out carefully 
the directions given by Seashore for the test. The experimenter emphasized the 
particular tonal quality which the subjects were to notice. Next he illustrated the 
way the judgments were to be made by calling out aloud the correct response for some 
three or four pairs of sounds played. The subjects were then asked to call out aloud 
in unison the judgments for some five or six other pairs of sounds. The purpose of 
this fore-exercise was to familiarize the subjects with the type of discrimination to be 
made and to prevent, as far as possible, a misunderstanding of the directions. After 
the fore-exercise the experimenter reread the directions and the test began. 

3. The Mechanical Abilities Tests —The five tests in this group consisted of a 
tapping test, a speed of movement test, a cancellation test, a substitution test, and a 
second tapping test. These will be briefly described. 

a. Tapping I.—This test was given on Nov. 5th, 1925, to more than 100 college 
students of George Peabody College and was repeated two weeks later. ‘Two classes 
in psychology were used. One hundred and four subjects took both tests, these being 
the same to whom the Otis test was given. 

The test blank contained six blocks of small 5-millimeter squares, each block 
being thirty squares long by seven squares high. ‘The subject was instructed to put 
a dot in each square as rapidly as possible. Fifteen seconds were allowed for each of 
the five blocks, with 30 sec rest between each block. There were thus five equal parts 
to the test. The score was the number of dots made, irrespective of accuracy. 





RELIABILITY OF MENTAL TESTS 


os) 
a 


b. Speed of Movement.—This test was given at the same time and to the same 
group of students as Tapping I, immediately following it. 

The subject was required to mark nearly vertical lines as rapidly as possible 
between the two horizontal lines of a long slender rectangular figure. There were 
three groups of four such figures in each group. The subject marked for 15 sec in 
each group and was allowed to rest 30 sec between groups. There were, accordingly, 
three equal parts in the test. 

c. Cancellation.—This test was given to the same students as the preceding; 
but inasmuch as a few subjects finished the test before the time-limit expired, and 
were therefore excluded from the present study, the number of cases was reduced 
to IOI. 

The subject was required to cross out the A’s in a page of printed capitals, dis- 
tributed in chance order, as rapidly as possible. Two minutes were allowed for the 
test. At the end of the first minute the experimenter called ‘Mark,’ and the subjects 
made a mark to indicate the point at which they were working. ‘Two equal divisions 
of the test were thus secured. 

d. The Substitution Test.—Two forms were used. A standard digit-symbol 
form was given to the same individuals taking the three preceding tests on November 
sth, 1925. A symbol-digit form was given to the same groups two wecks later, 105 
persons taking both tests. Five minutes were allowed for the test and five sub- 
divisions secured by having the subjects mark at I-min intervals to show where they 
were working. Inasmuch as several persons completed the test before the expiration 
of the last time limit, only the first four parts were included in the present study. 

e. Tapping II.—This test was not given at the same time nor to the same indi- 
viduals as the four next preceding. It was given first to a class in psychology on 
February 15th, and repeated February 22d, 1926. Sixty-nine subjects took both 
tests. 

The test-blank consisted of nine rows of squares, each row containing six one- 
inch squares. In this test, however, only five squares in each row and only five of the 
rows were used. 


TaBLe VI 


NUMBER OF SUBJECTS, NUMBER OF PARTS, AND TIME ELAPSING BETWEEN THE FIRST 
AND THE SECOND TESTING, FOR ALL TESTS 


Number Number Time between 
Test of of first and 

subjects parts second testing 
Se ee esi ekaciiicincne tg 2 2 weeks 
EARS ea oo eee ee 106 10 44 months 
ee Psnie eb ~~ 10 a " 
a ae a ian nig os 10 ” ” 
Consonance........... Pa eaaat .. 109 10 ie 
Tonal Memory................... 109 10 " " 
CN A heah ccucea eeu ee nes os 109 I " ‘ 
CEE: 5 2 wecks 
Speed of Movement............... 105 3 S ” 
EET Oe eT T eer 10 2 ~ 
PT EE POT ee 105 4 -~ 
NG Che daerhavwaasa vans 69 5 I 2 











82 LYLE H. LANIER 


The subjects were instructed to tap as rapidly as possible. They tapped for 3 
sec in a square and then went to the next one without stopping when the experimenter 
called ‘Shift.’ This procedure was repeated until the row was finished. Between 
the successive rows there was a rest period of 30 sec. 

The following table will summarize some of the facts about the tests used. 

Methods Used.—A part of the present study consisted in the application of the 
methods used by Holzinger to the data from the above tests. It will be recalled that 
he correlated each part in test 4 (used here and throughout this paper to denote the 
first application of atest; ‘test B’ referring to the second application or the application 
of a second form) with the corresponding part in test B to get the reliability of the 
separate parts. The average of all these correlations of the parts was figured. The 
theoretical reliabilities for tests 2, 3, 4, --* m times the length of one part were then 
found by substituting in the prophecy-formula either (1) the initial coefficient or the 
reliability of the first part, (2) the mean of the reliabilities of all the parts, or (3) the 
‘best fit’ value as determined by the method described above on page 76 (footnote). 
All three of these values have been used for all tests in this investigation with the 
exception of the Otis test, where there was not a sufficient number of parts to justify it. 
The special way in which the scores from this test were handled will be described below 
with the results. 

Inasmuch as the music tests and some of the others con- 
tained several parts, it was decided to try out the predictions 
made from the formula within a single test. The procedure 
may be illustrated with the pitch test. Part 14 was corre- 
lated with part 24, and part 24 with part 34. The mean of 
these two correlations was obtained. (A more representative 
mean value would have been’ given by correlating each part 
in the test with every other part and taking the mean of all 
of them but this would have meant an enormous amount of 
calculation and probably would not have repaid the effort). 
Both this mean and the initial correlation (14 with 24) were 
substituted in Spearman’s formula, letting m equal 2, 3, etc., 
according to the number of actual reliabilities it was possible 
to obtain for comparison. From these actual reliabilities a 
‘best’ value was obtained and this, too, was substituted in the 
formula. There were thus three series of theoretical relia- 
bilities analogous to those secured when both test 4 and test 
B were used. The actual reliabilities were secured by 
summing 14 and 24 and correlating the sum with the sum of 
34 and 44 (1+2+3)4 with (44+ 5+6)4, etc. The 
number of actual reliabilities thus secured varied, of course, 
with the number of parts in the test. For all of the music 


tests there were five, for three of the mechanical abilities tests 





RELIABILITY OF MENTAL TESTS 83 


and for the Otis test there was one each and for two of the 
mechanical abilities tests (cancellation and speed of move- 
ment) there was none. 

Inasmuch as the primary reason for using the prediction 
formula is to obtain a reliability coefficient from a single 
application of the test, it seemed of especial importance to 
determine the possibilities in that respect with the data from 
these tests. Hence, for each of the tests, where the number 
of parts admitted of such procedure, a subdivision of it has 
been made into equal parts by two methods, and the two 
equal parts correlated in both instances to give coefficients 
that might be used to predict the reliability of the whole test. 
In the first place, the first half (where there was an odd 
number of parts in the test, the last one has been ommitted 
thus leaving a test that might be equally subdivided) has 
been correlated with the second. Secondly, the odd elements 
in each test have been summed and the result correlated 
with the sum of the even elements. The coefficients resulting 
from correlating the halves of the test as secured by these 
two methods of subdivision have been substituted in Spear- 
man’s formula, letting » equal 2, to give the predicted 
reliability for the whole test. These theoretical reliabilities 
were then compared with the actual reliabilities found when 
tests 4 and B were correlated. 

A fourth aspect of this study has been to determine the 
predictions made with the formula when the increase is in 
the number of subjects rather than in the length of the test. 
This has been done for every test we have used by dividing 
the first 100 subjects in each test (with the exception of the 
second tapping test where there were only 69 cases and 
where the first 60 subjects were taken) into five groups of 20 
persons each. For the first three such groups the correlations 
between tests 4 and B have been derived. The average of 
these three correlations was then substituted in the prophecy- 
formula, letting n equal 2, 3, 4 and 5. ‘This gives the theo- 
retical reliabilities for 40, 60, 80, and 100 cases. ‘The actual 
and the predicted reliabilities were then compared for all 
the tests. 











84 LYLE H. LANIER 


The probable errors of the predicted values have been 
calculated from a formula given by Shen (12). Holzinger 
objects to this formula as being only an approximation; 
but it is much shorter than the one which he proposes and the 
very many probable errors to be figured here made it almost 
necessary, in the time available, to use the shorter formula. 
It is written as follows (in our usual notation). 


n-(I — rz”) 


VN - [1+ (n — 1)rez]? 





P.E tan = 6745 


THE RESULTS 


The results are presented in two sections. In the first, 
the success of the formula in predicting the reliability to 
be expected when the length of the test is increased is con- 
sidered. In the second. the actual and the predicted relia- 
bilities are compared when the increase is in the number of 
subjects used. Accordingly, two tables of data are presented 
for each of the tests, one showing the actual and predicted 
reliabilities within a single test, as described above, the other 
the reliabilities when both the first and the second tests are 
used. In this second table the three series of predicted values 
and the predictions based upon the correlations of the halves 
of the first test are included. In addition to the tables con- 
taining these comparisons, a table of the means of the several 
parts and one with their separate reliabilities is given. 


Section I. Reliability with an Increase in Length of the Test 


The Otis Self-Administering Test.—It was mentioned above 
that only two divisions of equal time-limits were secured for 
this test and that each of these was further divided into 
halves by summing separately the odd and the even elements. 
The means of the several parts involved in the reliability 
study are shown in the following table. 

These means for the parts confirm Holzinger’s results in 
showing that the Otis test increases in difficulty, the mean 
score for the second period being approximately half that for 
the first. They show also that test B is somewhat easier 








RELIABILITY OF MENTAL TESTS 85 
Tas_Le VII 
MEANS OF THE PARTS OF THE Ot!s TEST 

Part Mean 
First 10 min........ 4 28.950 
eaten B 27.820 
Second 1o min........ 1 _ 13.680 
OF eae dances ae 16.754 
Odds Ist 10 min......... A 14.476 
Fe et Sarg B ksh eas 14.776 
Evens Ist 10 min........ A.. awe 14.926 
- tale —— 14.590 
Odds 2d 1tomin......... A 6.930 
Oe Oe divest. B 8.475 
Evens 2d 10 min......... 4 7.210 
ow ea calmed 8.888 


than test 4, or else that practice in taking test 4 was an 
advantage in taking the second test. 

The possibilities were limited in trying out the predictive 
value of the formula within test 4. Inasmuch as the two 
parts of this test were divided into odd and even elements, 
it was possible to correlate separately these elements for the 
first and for the second ten minutes; to take an average of 
these two correlations, and then to substitute both this 
average value and the initial correlation in Spearman’s 
formula, letting m equal 2. to get the reliability for an entire 
ten-minute period. This has been done and the results are 
shown in Tables VIII and IX. The actual reliabilities of the 
test were found by correlating the first half (1st 10-min) 
with the second (2nd 10-min) and by correlating the odds with 
the evens for the whole test. 


TaB_e VIII 


CORRELATIONS OF THE ODD WITH THE EVEN ELEMENTS FOR Or1s TEST JA (FIRST TEST) 


Parts correlated r 
Odds with evens (Ist 10)..................... cea - .g2 
” - i ae eee 565 
BN GT IE WII. 0 ones oc vce cnccscseness , 743 
TABLE IX 


OTIS TEST; ACTUAL AND PREDICTED RELIABILITIES WITHIN THE FIRST TEST. Pae- 
DICTED RELIABILITIES ARE BASED UPON THE VALUES FROM TaBLe VIII 


Actual reliabilities Predicted reliabilities 
Ist with Odds test 4 Initial Mean 
2nd half with evens 4 r r 
inaneeenseadsadaes 623 958+.005..... . 8§3+.019 


7 











86 LYLE H. LANIER 


From Table VIII it is readily seen that the second part of 
the test is much less reliable than the first, as well as more 
dificult. There is a marked over-prediction® when either 
the correlation of the odds and evens of the first part or the 
mean of such correlations of both parts is used in the formula, 
as Table X shows. The low correlation between the first 
and the second parts of the test is surprising, in view of 
the high reliability of the whole test. It is probable that the 
speed factor influences this correlation. If the brighter 
students were also faster in answering the questions, as we 
may assume, they would work through the easier problems 
and would be working on more difficult ones, during the 
second period, than the slower, less bright persons. Hence 
the slower persons who scored lower on the first half would 
score relatively higher on the second half, due to the lesser 
difficulty of the problems on which they were working. Care 
should be taken in subdividing tests of this sort (1.e. with a 
total time-limit and where the problems are of increasing 
difficulty) for the purpose of predicting their reliability. 
With such tests the odd-even method is better, since parts of 
more nearly equal difficulty are thus secured. 

We shall now consider the predicted and the actual 
reliabilities when both test 4 and test B are used (Table X). 


TABLE X 


SHOWING THE CORRELATIONS OF THE PARTS IN TEST A WITH CORRESPONDING PARTS 


in Test B (Otis TEST) 
Correlation be- 


Part tween test 4 
and test B 
OCTET FTC ET RCO TOT CULE CT CCT » 
ER ccincateaeeourenawes aebvevetsdawwegunewes 508 
I, « (cbekerdpeecieedvenddodeseramaeedens 635 


In Table XI the actual reliability of the Otis test is 
compared with the theoretical reliabilities based upon the 
correlations of the halves of test 4 as given by the two 
methods, the correlation of the first half with the second, and 
the sum of the odd with the even elements of the whole 
test. 


5’ The term ‘over-prediction’ is used when the predicted value is higher than the 
actual value, ‘under-prediction’ when it is lower. 




















RELIABILITY OF MENTAL TESTS 87 


TABLE XI 


OTIS TEST; THE ACTUAL AND PREDICTED RELIABILITIES WHEN BOTH 
TEST 4 AND TEST B are USED 


Parts Actual Predicted reliabilities 
correlated relia- Initial Mean ist half with ¢ from odds 
bility r r znd half 4 with evens 
Tests 4 and B.. .£823 .850+.019 .770+.030 .479+4.101 .770+.030 


All the predicted values lie within less than two probable 
errors of the actual reliability of the test, except the one based 
upon the correlation of the first half with the second half of 
test 4. The reason for the low predicted value in this last 
case has been indicated above. Hence we may conclude that 
with tests of this sort one can fairly well predict the reliability 
to be expected when the length of the test is increased, 
provided a suitable value be selected for substitution in the 
formula. Where only one application of the test is available, 
as is the case in the practical situation where an estimate of 
the reliability is desired, the test should be divided by the 
method of cumulating the odd and the even elements or by 
some other method that will eliminate the effect observed 
when the first and second halves were correlated. 

The Tests of Musical Abilities —The results for the music 
tests are shown in Tables XII to XXVI. The means for the 
parts used in this study are given, for all six tests, in Table 
XII. It is seen there that the parts within any test are 


TaBLeE XII 


MEANS OF THE PARTS (AND TOTALS) OF THE MUSICAL ABILITIES TESTS. THE SCORE 
IS THE PER CENT. OF CORRECT JUDGMENTS 


Part Pitch Intensity Time Conson. Ton. M. Rhythm 
First test 

nba eu an 6.73 9.20 8.05 7.96 7.02 8.58 
Ricéednaes 6.78 8.80 8.35 7.44 7.09 8.61 
. PPrerere 7-49 8.62 8.35 8.54 6.67 8.93 
Rittesnane 7.2 9.18 8.26 7.83 6.78 8.06 
| ee 6.94 8.73 8.28 7.42 7.02 7.49 
aia a ae 7.37 8.94 8.26 7.18 6.67 8.45 
Perr ee 7.44 9.08 8.28 6.56 7.65 B.S 
Dpacienwas 6.98 8.11 7.87 6.85 6.36 7.71 
Sree ee 6.94 8.44 8.2 8.70 6.95 7.41 
ere 6.91 9.12 7.77 7.15 6.56 7.88 


Whole test...... 66.56 80.34 76.94 65.94 $9.14 71.91 














88 LYLE H. LANIER 


TABLE XII—Continued 


Second test 


ieilhien eens 7.39 9.07 7.64 7.62 6.84 8.36 
errr 6.87 8.77 8.25 8.17 7.15 8.54 
reer 7.68 8.42 8.18 8.12 6.02 8.65 
- ern 7.47 8.32 8.14 8.10 6.53 7.55 
Dict ative 7.28 8.05 7.87 7.09 6.59 7.46 
ee 7.38 8.01 7.91 7.35 6.11 7.42 
ee 7.60 8.80 8.15 6.08 6.19 7.92 
Dicnkeannie 7.24 8.06 8.09 6.40 6.30 7.66 
ee anhoniaaee 7.05 8.45 7.83 8.24 5.96 7.02 
See 6.77 8.69 7.54 8.08 6.84 7.96 
Whole test...... 68.35 84.57 75.14 65.70 55.20 70.11 


more nearly of equal difficulty than is the case with the Otis 
test. This is due in part to the nature of these tests and in 
part to the method of subdivision. The successive parts are 
not consecutive in a temporal sense and there is consequently 
no practice effect such as we find in some of the tests of 
mechanical abilities. The method of selecting rows auto- 
matically gives, in the case of all the tests except consonance 
and rhythm, cross-sections in which there are ten elements 
each of difficulty comparable with that of corresponding 
elements in every other part of the cross-section. The fact 
that the mean of test B for four of the tests is lower than the 
mean of test 4 shows that there was no direct transfer-effect 
from the first to the second. | 

Table XIII contains the correlations for all six tests upon 
which the predictions of reliability within a single test are 


TABLE XIII 


CORRELATIONS OF 14 WITH 24 AND OF 24 WITH 34, AND THE MEAN OF THESE TWO 
CORRELATIONS, FOR THE SIX SEASHORE MUSICAL ABILITIES TESTS 


r Pitch Intensity Time Conson. Ton. M. Rhythm 
14 with 24..... .507 .356 .238 .072 .400 .227 
os ”  <2é..... 20 134 180 .084 328 .130 
ee .498 .240 .209 .078 364 .178 


based. Table XIV contains the correlations of each part in 
test 4 with the corresponding part in test B. It is from 
these two tables that the ‘initial? and the ‘mean’ values 
substituted in Spearman’s formula are taken. 











RELIABILITY OF MENTAL TESTS 89 


TABLE XIV 


CORRELATIONS OF EACH PART IN THE FIRST TEST WITH THE CORRESPONDING PART IN 
THE SECOND TEST FOR THE SIX Musica. ABILITIES TESTS 


Part Pitch Intensity Time Conson. Ton. M. Rhythm 
Miccikas sans .416 .260 .298 179 332 .210 
ere rry TT ore 550 .007 .056 .096 414 .250 
Diaseteseueden 411 .209 .000 412 420 .218 
tai wanes .389 326 .160 .227 402 114 
| Pre rerer ee 475 -232 309 -377 .216 095 
Mithetssdenens .376 154 167 .295 .188 140 
ee eee .272 180 .346 .172 338 .028 
Diveeneacueees 343 352 164 O91 464 .200 
ee er ere .638 17! 285 348 .388 197 

Pdtrosenkanee .488 318 .263 .326 .388 .078 
ee .438 221 .204 .252 359 153 


For each of the six tests two tables are given below. The first in each case shows 
the actual and the predicted reliabilities when only the parts of test 4 are cumulated. 
The second gives the results when the parts of both tests are used. 
been fully described above. p 

a. Pitch—The conformity between the actual and the theoretical reliabilities 
when the increase in length is secured at the same sitting is close, as Table XV shows. 


The method has 


TABLE XV 


CoMPARISON OF THE ACTUAL AND THE PREDICTED RELIABILITIES OF THE PITCH TEST 


WHEN INCREASED IN LENGTH AT THE Sam¢ SITTING 

Number Actual Predicted reliabilities 

of parts relia- 1. Using initial 2. Using meanof 3. Using ‘best 

cumulated bility coefhcient two correlations fit’ values 
Dt cinneuan 507 507 P.E. 498 P.E. 494 P.E. 
Ri eeeeeewsn -710 673.043 666 + .045 662+ .044 
De xcsaeacks 640 .756+.036 .750+.037 .740+.038 
7 800 .804+.031 800+ .032 794+ .032 
ee 839 .837+.026 .8342+.027 831+.028 


With the exception of the case where three parts are cumulated, the actual relia- 
bilities lie within less than one probable error of the predicted. The initial, the mean, 
and the ‘best fit’ values are here so nearly identical that the predictions are practically 
the same for all three. 

The actual and the predicted reliabilities based upon correlations between repeated 
tests are shown in Table XVI. 

In this case the predicted values are far larger than the actual when the number 
of parts cumulated exceeds two (for both initial and mean values). The ‘best fit’ 
value (.247) gives an under-prediction at first; but when more than four parts are 


cumulated, the predicted reliabilities exceed the actual. The agreement is not so 


close as Holzinger found with the Otis test; although the actual coefhicients fall within 
two probable errors (approximately) of the theoretical. 











go LYLE H. LANIER 


TABLE XVI 


CoMPARISON OF THE ACTUAL AND THE PREDICTED RELIABILITIES OF THE PITCH TEST 
WHEN INCREASED IN LENGTH AT different SITTINGS 


Number Actual Predicted reliabilities 
of parts’ relia- _—[nitial Mean of = ‘Best fit’ rfrom rf from odds 
cumulated bility r 10 r’s value halves of 4 and evens 4 
I 416 .416 P.E. .438 P.E. .247 P.E. .839 P.E. 812 P.E. 
Di sans 600 .§80+.055 .610+.053 .396+.079 
sie a §28 .685+.048 .703+.045 .496+.082 
Bieves 666 .740+.043 .759+%.040 .568+.080 
ee 675 .718+.038 .795+.035 .622+.077 
6 . 628 810.034 .822+.032 .665+.073 
Re vaa's 680 .833+.031 .846+.027 .695+.070 
Wessex 652 .850+.028 .862+.026 .725+.066 
er 676 .865+.026 .875+.023 .748+.063 
ere 680 .878+.024 .885+.022 .768+.059 .g1§5+.010 .897+.013 


It is interesting to note that the reliability of the pitch test does not increase 
after about five parts are cumulated; although Table XIV shows that elements 
cumulated later are just as reliable as the first five. This means that a test one-half 
as long as the present pitch test would give just as reliable results. 

The last two columns of Table XVI show the theoretical reliabilities based upon 
correlating the halves of test 4 and the odd and even elements of that test. The over- 
prediction is even greater in these two cases than in those just discussed. The results 
are about the same for the two methods of dividing test 4. 

In general it appears that, within a single test of the sort here used, one can predict 
rather accurately the reliability to be expected with increasing the length. In such 
a case conditions and attitudes of the subjects may be assumed to be more or less 
constant. On the other hand, when one attempts to predict the reliability for a 
second application of the test there are, apparently, changed attitudes and conditions 
that affect the relative standings of the subjects and thus lower the correspondence 
between the two sets of results. 

b. Intensity —In Table XVII the actual and the theoretical reliabilities within 
test 4 are compared. It is seen that the initial coefficient gives reliabilities that are 
too high while the mean and the best fit values both give theoretical coefficients that 
agree fairly well with the actual values (the mean and ‘best’ values are identical in 


this case). 
TaBLe XVII 


CoMPARISON OF THE ACTUAL AND THE PREDICTED RELIABILITIES OF THE INTENSITY 
TEST WHEN INCREASED IN LENGTH AT THE Samé SITTING 


Number Actual Predicted reliabilities 
of parts relia- 1. Using initial 2. Using meanof 3. Using ‘best 
cumulated bility coefficient two correlations fit’ values 
Rew dehanaee 356 356 P.E. 240 P.E. .240 P.E. 
Si nanevnet 393 -§25+.062 .387+.079 .387+.079 
Oe denne 399 .622+.058 .487+.085 .486+.085 
OD se at eit oan .5§50 .690+.053 -§58+.083 §58+.083 
5 .610 -7342.048 .613+.080 .612+.080 


The results for two applications are shown in Table XVIII. 








RELIABILITY OF MENTAL TESTS g! 


Taste XVIII 


CoMPARISON OF THE ACTUAL AND THE PREDICTED RELIABILITIES OF THE INTENSITY 


TEST WHEN INCREASED IN LENGTH AT different SITTINGS 


Number Actual Predicted reliabilities 
of parts’ relia- _—[nitial Mean of ‘Best fit’ rfrom — ¢ from odds 
cumulated bility r 10 r’s value halves of 4 and evens 4 
Ricans 260 .260 P.E. .221 P.E. .153 P.E. 610 P.E. .667 P.E. 
2..... .238 .41324.077 .362+4.084 .265+.097 
er 495 .5142%.079 .4604.090 .35I1+.112 
enawa 306 .§85+.077. .530+.089 .419+.118 
a .438 .639%.073 .5874.088 .475+.124 
6..... .§13 .678+.069 .6304.085 .520+.125 
7...+. 06° .712+.065 .665+.081 .560+.124 
Dieses 603 .738+.061 .6934%.077. .§93+.122 
Q..... .§73 -760+%.058 .717+.073 .618+4.117 
ee 597 -7782%.055 .740+.069 .6454.115 .758+.032 .799+.026 


Both the initial and the mean values give over-predictions. The reliabilities 
predicted from the best value agree fairly well with the actual coefficients. ‘This is 
no more than we expect, however, since the best fit value is derived from the latter. 

Regarded in general, the results from this test are closely similar to those obtained 
from the pitch test. The predictions based upon the correlations of the halves of 
test 4, whether the first half with the second or the odd with the even elements, are 
much too high, as was the case with pitch. The predictions within a single test are 
good, but this conformity is not secured when the predictions are based upon corre- 
lations of parts given at two sittings. 

c. Time.—In the time test, the reliabilities predicted from the initial and the 
mean values correspond fairly closely with the actual values secured from correlations 
within test 4. The latter coefficients are within one probable error distance of the 
theoretical values. The correlations based upon the ‘best fit’ value do not agree so 
well with the actual reliabilities. This is due to the low reliability (.133) for the 
cumulation of two parts, which makes the ‘best fit’ value unusually low. 


TABLE XIX 


CoMPARISON OF THE ACTUAL AND THE PREDICTED RELIABILITIES OF THE TIME TEST 
WHEN INCREASED IN LENGTH AT THE Sam¢ SITTING 


Number Actual Predicted reliabilities 
of parts relia- 1. Using initial 2. Using meanof 3. Using ‘best 
cumulated bility coefficient two correlations fit’ values 
i aan eaeaes .238 238 P.E. 209 P.E. 47 F.8 
Diseacaweed .133 -384+.081 -346+.089 209 +.094 
re .420 .483+.091 -442+.092 .340+.111 
ees eee 451 .§56+.090 514.094 408 +115 
Peer errs 593 614.088 .§70+.092 404.121 


In Table XX we see the same sort of over-prediction found above with the test 
for pitch and intensity. The theoretical coefficients based upon all three values 
regularly exceed the actual, although the latter fall within the limits of one probable 











g2 LYLE H. LANIER 


error of the ‘best fit’ predictions. In no case is this true of the values based upon the 
mean or the initial coefficients. The mean of the ten correlations furnishes a better 
basis for prediction than the initial correlation, although for both series the deviation 
of the value nearest the actual reliability is at least two probable errors. 


TABLE XX 


COMPARISON OF THE ACTUAL AND THE PREDICTED RELIABILITIES OF THE TIME TEST 
WHEN INCREASED IN LENGTH AT different SITTINGS 


Number Actual Predicted reliabilities 


of parts relia- _—Initial Mean of ‘Best fit’ rfrom  ¢ from odds 
cumulated bility r 10 r’s value halves of 4 and evens 4 
r..... 298 .2908 P.E. .204 P.E. .110 P.E. .593 P.E. .540 P.E. 
Ditmas 303 .459+.071 .339+.087 .198+.106 
3 287. .5604.070 .4334.096 .270+.130 
4 303 .630+.066 .$05+.098 .331+.147 
g..... 416 .680+.062 .562+.097 .381.157 
6..... 361 .718+.057 .606+.094 .426+.161 
7..... 408 .750+.053 .643+.091 .464+.162 
er 445 .772+.050 .6732%.087 .497+.164 
Q..... 477 .792+.046 .698+.084 .527+.164 
ee 495 .810+.043 .721%.081 .5534.162 .745+.033  .703+.039 


Again the predictions based upon the odd and even elements and those from the 
correlation of the two halves of test 4 are far too high. The two series of predicted 
values are not very different, as Table XX shows, the odd-even series being slightly 
nearer the true reliabilities. 

d. Consonance.—In predicting the reliabilities within the first consonance test, 
it is apparent from Table XXI that the values upon which the theoretical coefficients 
are based are too low, 1.¢., that they are not representative of the real situation, for 
the actual reliabilities exceed the predicted in every case. The true reliabilities, 
however, fall within the limits of one probable error from the theoretical values. 
Attention should be called to the very large probable errors of the predicted series. 
The reliability of the latter is very doubtful. 


TABLE XXI 


COMPARISON OF THE ACTUAL AND THE PREDICTED RELIABILITIES OF THE CONSONANCE 
TEST WHEN INCREASED IN LENGTH AT THE Sameé SITTING 


Number Actual Predicted reliabilities 
of parts relia- 1. Using initial 2. Using meanof 3. Using ‘best 
cumulated bility coefficient two correlations fit’ values 
3 .072 072 P.E. 078 P.E. 113 PE. 
esd varksen 223 -134.113 -145+.109 -203+.104 
ee eee +357 -198+.142 -202+.142 277.129 
Redaveeens 344 237.175 257.165 337.145 
| eee 446 .280+.182 .297+.180 -390+.154 


In Table XXII the results based upon correlations in the two tests are shown. 
The ‘best fit’ value, as usual, furnishes the most accurate series of predictions, the 








true reliability lying always within less than one probable error of the predicted 
Of the other two series, the one based upon the initial value is more 


coefficient. 


accurate. 
probable errors of the theoretical. 


closer. 


RELIABILITY OF MENTAL TESTS 


TABLE XXII 


In this case, the true reliabilities, with a single exception, lie within two 
For the first half of the test the conformity is 


CoMPARISON OF THE ACTUAL AND THE PREDICTED RELIABILITIES OF THE CONSONANCE 
TEST WHEN INCREASED IN LENGTH AT different SITTINGS 


Number Actual Predicted reliabilities 
of parts relia- _[nitial Mean of ‘Best fit’ rfrom  r from odds 
cumulated bility r 10 r’s value halves of 4 and evens 4 
Buscee DD 299 FS. 268 FE. 2200 FPS. 46 FS. 30 PE. 
re 158 .304+.089 .403+.077 .214+.102 
3...+. 353 .3962%.102 .503+.081 .290+.127 
4..... 40§ .4662%.10§ .574+4.079 .352+.143 
a 466 .§22+.105 .6264.075 .406+.152 
Biees $10 .§67+%.104 .670+.071 .450+.158 
7..... 6600 .606+%.101 .700+.068 .489+.160 
8..... .470 .637+.098 .728+.065 .522+.161 
; 461 .6644.094 .7524.061 .552+.161 
10..... .543 .686+.092 .770+.059 .577+.160 .617+.049 .496+.096 


The prediction based upon the correlation of the halves of test 4 is too high, 
but the actual reliability lies within two probable errors of it. 
is lower than the actual reliability, the only time this occurs in this study, but the 
actual value lies within one probable error of it. 

e. Tonal Memory.—Within test 4 the conformity between the actual and the 
predicted reliabilities is less close than for any of the tests thus far considered. As 
Table XXIII shows there is little difference between the predictions based upon the 
mean and those based upon the initial coefficient. 
When only two parts are cumulated the actual reliability and the prediction based 
upon the initial coefficient are practically identical, but as the length of the test is 
increased, the theoretical series increases much more rapidly than the experimental. 
The ‘best fit’ value yields the most accurate theoretical coefficients, a fact which, 
as we have seen, is generally true. 


TABLE XXIII 


CoMPARISON OF THE ACTUAL AND THE PREDICTED RELIABILITIES OF THE TONAL 
MEMORY TEST WHEN INCREASED IN LENGTH AT THE Same SITTING 


The odd-even prediction 


The former are somewhat better. 


Number Actual Predicted reliabilities 
of parts relia- 1. Using initial 2. Using meanof 3. Using ‘best 
cumulated bility coefficient two correlations ht’ values 
eedadenras .400 .400 P.E. 364 P.E. 292 P.E. 
casawewes -579 §72+.055 §352.061 .452+.069 
| Ae 555 .668+.051 .633+.057 554.068 
Bi. . 615 728.045 696+ .053 623+.064 
See .602 .769+.039 742.047 673+.060 











94 LYLE H. LANIER 


Table XXIV contains the comparisons of the coefficients based upon correlations 
of parts in both tests with the true reliabilities. There is great over-prediction in 
the two series based upon the initial and the average values, after as many as four 
parts have been cumulated. The initial and the mean coefficients are practically 
identical in value. The ‘best fit’ series rather closely parallels the actual reliabilities 


throughout. 


CoMPARISON OF THE ACTUAL AND THE PREDICTED RELIABILITIES OF THE TONAL 


TABLE XXIV 


MEMORY TEST WHEN INCREASED IN LENGTH AT different SITTINGS 


Number Actual 


Predicted reliabilities 


of parts relia- _—Initial Mean of ‘Best fit’ r from odds 

cumulated bility r 10 r’s value halves of 4 and evens 4 

g..... ge 9938 F.E. pep PS. oon PZ. Soa PS. 3358 PZ. 
Bianes 430 .498+.065 .528+.061 .338+.087 
eee 540 .600+.062 .627+.057 .435+.094 
es 529 .6664.058 .692+.052 .506+.096 
os §70 .714+.053 .738+.047 .563+.095 
6..... .§77 .750+.049 .771+.044 .606+.092 
e.4 596 .778+.045 .7964%.040 .642+.089 
8..... 605 .798+.042 .819+.037 .674+.086 
g..... 637 .819+.038 .834+.034 .698+.082 

hs kone 669 .832+%.035 .848+.031 .720+.080 .752+.032 .852+.028 


The prediction from the correlation of the halves of test 4 is better than the 
odds-evens prediction. The latter is far too high, while the actual reliability lies 
within less than two probable errors of the former. 

f. Rhythm.—The reliability of this test is very low and the predicted coefficients 
are, as a rule, far in excess of the actual, both within test 4 and when both tests are 
used. In the first case, as Table XXV shows, the very large probable errors of the 
theoretical coefficients render them practically worthless. These probable errors, in 
the case of the series based upon the mean and upon the ‘best fit’ values, are so large 
that the actual reliabilities usually fall within less than two probable errors of the 
theoretical. Where the reliability of the latter is so doubtful, however, this fact does 


not mean very much. 
TaBLtE XXV 


CoMPARISON OF THE ACTUAL AND THE PREDICTED RELIABILITIES OF THE RHYTHM 
TEST WHEN INCREASED IN LENGTH AT THE Same SITTING 


Number Actual Predicted reliabilities 
of parts relia- 1. Using initial 2. Using meanof 3. Using ‘best 
cumulated bility coefficient two correlations fit’ values 
eee a7 &.4. 178 P.E. 114 PE. 
Siineedanca .336 .370+.081 -303 +.089 -203+.077 
er eres 2 .468 + .086 394.102 -227+.129 
4 .278 -540+.085 -4572.105 337.145 
5 .280 §95+.084 .§20+.110 -390+.155 


The predictions based upon both tests 4 and B are far too high when the initial 
or the mean values are used. The predictions from the mean value (Table XXVI) 








RELIABILITY OF MENTAL TESTS 95 
are fairly good for the first half of the test, but after that they increase much more 
rapidly than the actual. The latter, in fact, do not increase, being in this respect like 
the test for pitch. A test less than one-half as long as the present would, apparently, 
be just as reliable. 


’ r r 7 
Taste XXVI 
CoMPARISON OF THE ACTUAL AND THE PREDICTED RELIABILITIES OF THE RHYTHM 
TEST WHEN INCREASED IN LENGTH AT different SITTINGS 


Number Actual Predicted reliabilities 

of parts relia- Initial Mean of = ‘Best fit’ rfrom_ ¢ from odds 
cumulated bility r 10 r’s value halves of 4 andevens 4 
g..... £80 280 PZ. .063 PS. 096 PE. 00 PE. .353 PE. 

2..... .268 .3472%.090 .265+.096 .175+.105 

i mies 385 .4442%.092 .351.112 .241%.132 

Rowse 443 .5164.094 .415+.117 .298+.149 

§...-- 026 6.671.092 .475%.127 .347+%.158 

6..... .369 .6162%.088 .§20+.123 .389+.170 

7. ..0+ 2373 51%.084 .$60+4.123 .426+.174 

Deena 340 .681%.081 .§92+.120 .460+.177 

Q..... 350 .706+.077. .620+%.117 .488+.178 
err 425 .727%.074 .64524.11§5 .S1S4.177 .437%.072 .§20+.062 


It is interesting to note that the correlation for the two halves of test 4, when 
substituted in the formula, gives a predicted reliability for the whole test that almost 
coincides with the true reliability. The odd-even prediction is too large; but the 
actual coefficient nevertheless lies within two probable errors of it. 


Summary for the Musical Tests 


1. Predictions within one test. 

For the majority of tests used here (four of the six) the 
actual reliabilities given by correlating increased lengths 
within one test lie within one probable error of the corre- 
sponding predicted reliabilities. The best fit values 
slightly more accurate than this and are so for all six tests. 
2. Predictions based on two applications of the tests. 

a. In all six tests there is marked over-prediction when 
either the initial or the mean coefhcient is used. 

b. The ‘best’ value gives predictions that generally include 
the actual reliability, within a range of from one to two 
probable errors. 

c. In five of the six tests there is marked over-estimation 
when the correlation of the halves of test 4 is substituted in 
the formula. The same is true of the predictions based upon 
the correlation of the odd and even elements. 


are 


Inasmuch as 











96 LYLE H. LANIER 


these two sets of predictions are the ones of most practical 
import, the fact that they are far too high is interesting. 
It would seem that one should be very cautious in estimating 
the result to be expected with repeating a test from a value 
secured under the relatively constant conditions that prevail 
at a single testing. 

d. In three of these tests the correlation of the first half 
with the second furnishes the more accurate basis for pre- 
diction, while in the other three the odd-even correlation 
leads to better estimations of reliability. 

e. The results from the tests for pitch, consonance and 
rhythm show that all three might be decreased in length at 
least one-half with no loss in reliability. 

Taking the results at large, one is struck by the variability 
of the actual reliabilities with increasing lengths of the tests. 
As we saw, increase in length means increase in reliability 
throughout the series for only three of the tests; and the 
increase for these three (intensity, time and tonal memory) 
was far from the regular increase predicted from the formula. 
Contrary to Crum’s contention, there 1s a problem involved 
in determining the reliability of a test of such a simple per- 
formance as the discrimination of the pitch or the loudness 
of two tones. 

The Tests of Mechanical Abilities —The means for all parts 
of the five tests are shown in Table XXVII._ In these tests, 
as contrasted with the musical tests, there is improvement 
with practice, shown by the increase in the size of the means 
of the successive equal parts. This is particularly true of the 
first tapping test and of the speed of movement. 


TaBLE XXVII 


MEANS OF THE PARTS OF THE MECHANICAL ABILITIES TESTS 


Part Tapping I Speed M. Cancel’n Substit’n Tapping II 
First test 
Resa aiwaeine 53.88 78.45 67.26 34.71 99.68 
Se a 63.80 76.99 64.43 29.66 99.09 
eee ads 70.24 79.64 29.86 100.75 
Ee 74.50 31.58 104.25 
| ee 76.96 104.72 


Whole test..... 337-30 230.39 129.25 142.28 507.93 











RELIABILITY OF MENTAL TESTS 97 


TABLE XXVII—Continued 


Second test 
ail aad 84.71 81.94 70.23 33.89 104.33 
i aii ti eon 88.79 83.83 66.45 32.60 103.79 
Dennen see 91.29 89.35 33-94 102.20 
ee 37-54 103.13 
ee aedcsam eed 98.74 103.34 
Whole test..... 452.30 254.20 138.20 154.47 $15.23 


Tables XXVIII and XXIX contain the correlations of the 
parts upon which the predicted reliabilities for the five tests 
of mechanical abilities are based. The first gives the corre- 
lations of the parts within a single test, while the second shows 
the correlations of each part in test 4 with the corresponding 
part in test B. 


TaBLE XXVIII 


CoRRELATIONS OF 14 WITH 24 AND OF 24 WITH 34, AND THE MEAN OF THESE TWO 
CORRELATIONS, FOR THE FIVE MECHANICAL ABILITIES TESTS 


Tapping Speed of Cancel- Substi- Tapping 
r I Movement lation tution iI 
14 with 24...... 730 779 772 682 671 
OG G8. cee 861 JOS 854 746 
cs ede kak 795 842 772 .768 .708 


TABLE XXIX 


CoRRELATIONS OF EACH PART IN THE FIRST TEST WITH THE CORRESPONDING PART IN 
THE SECOND TEST FOR THE FIVE MECHANICAL ABILITIES TESTS 


Tapping Speed of Cancel- Substi- Tapping 
Part | Movement lation tution I] 
iain a: eG aad aaah 372 675 623 469 .§22 
Dithandccs name eke 412 .697 655 495 61¢ 
Dibetsnansdacenes 494 705 536 573 
| STTCeeer eC eee -505 -479 .626 
a sea nice ea eae .416 .667 
re 439 692 639 495 599 


Inasmuch as there were fewer parts in each of these tests 
than in the musical tests, the data on comparisons of the 
actual and the theoretical reliabilities within test 4 are 
considerably limited. In three of the tests (Tapping I, 
Substitution and Tapping II) a single comparison for each 
is possible. The cancellation and the speed of movement 
tests with only two and three parts, respectively, did not 





98 LYLE H. LANIER 


admit of any correlation between cumulated parts within 
test 4, and consequently of any comparison of actual and 
theoretical reliabilities. As for the other three tests, only 
the initial and the mean values have been substituted in 
Spearman’s formula. The ‘best fit’ value could not be used 
because there were not enough actual reliabilities from which 
to derive it. 

The data involving both test 4 and test B, however, have 
been handled exactly as were those for the music tests, with 
the exception noted above that where there was an odd 
number of parts in a test to prevent its even division the last 
part was omitted, the first half of the remainder being corre- 
lated with the second half. 

a. Tapping I.—The actual reliability coefficient for a test twice as long as one 


of the parts is compared with the theoretical reliability in Table XXX. Both the 
initial and the mean values give significant over-predictions. 


TABLE XXX 


Tappinc I. ACTUAL AND PREDICTED RELIABILITIES WITHIN ONE TEST 
Number of parts Actual Predicted reliability 
cumulated reliability Initial coef. Mean of 2 coef. 
Diane bibvemenees .730 .730 P.E. 795 P.E. 
O sntaveneninee 745 844+.024 887+.015 


Table XXXI contains the actual and the predicted reliabilities based upon the 
correlations of the corresponding parts in the two tests. The predictions based upon 
the initial coefficient agree well with the actual reliabilities until more than three parts 
are cumulated. Beyond this number the theoretical values are too large. A glance 
at the actual reliability series shows that no increase in reliability is secured by in- 
creasing the length of the test beyond three times the length of one part. The pre- 
dictions based upon the mean value are even larger than the predictions from the 
initial correlation. The ‘best fit’ value yields predictions that are in fairly good 
agreement with the true reliabilities; although the conformity is not so close as one 


might expect. 
TABLE XXXI 


Tappinc I. ACTUAL AND PREDICTED RELIABILITIES WITH REPEATED TESTS 
Number Actual Predicted reliabilities 
of parts. relia- Initial Mean of ‘Best fit’ r from (1+2) +r from odds 
cumulated bility r Sr’s value with (3+4)4 andevens 4 
eee 0998 «1972 PE. 4399 PB. t9 PE. gag PE. O35 PE. 
2..... -547 .$422.061 .610+%.051 .485+.068 
3..... 615 .641%.057 .702+.045 .568+.069 
re 612 .706+.051 .760+.040 .653+.064 .855+.019 
§..... 630 .748+.047 .798+%.03§5 .702+.057 QI2+.01I 





RELIABILITY OF MENTAL TESTS 99 


Due to the odd number of parts of the test, the sum of (1 + 2)4 was correlated 
with the sum of (3 + 4)4 and the reliability predicted from this value is compared 
with the actual reliability of the sum of four parts. There is a very great over-pre- 
diction by this method; and the over-prediction is even greater when the correlation 
of the odds with the evens is the basis for prediction. 

b. Speed of Movement.—Inasmuch as there were only three parts in the test no 
internal comparisons were possible. Table XXXII shows the series of actual and 
predicted reliabilities based upon correlations of parts in both tests. It is seen here, 
as we have found for all the musical tests as well, that the initial and the mean values 
lead to significant over-predictions while the ‘best fit’ value gives a series of reliabilities 
that agree moderately well with the experimental values. The correlation of the odds 
and the evens of test 4 when substituted in the formula predicts a reliability that is 
much too high. The same is true for the correlation of 14 with 24, whose predicted 
reliability is compared with the actual reliability of two cumulated parts. 


TaBLE XXXII 


SPEED oF MoveMENT. ACTUAL AND PREDICTED RELIABILITIES WITH REPEATED TESTS 


Number Actual Predicted reliabilities 
of parts relia- Initial Mean of ‘Best fit’ r from r from odds 
cumulated _ bility r 3 7's value halves of .4 and evens 4 
eer 675 .675 P.E. 692 P.E. .598 P.E. .779 P.E. .862 P.E. 
ae 720 .807+%.025 .818+.024 .748+.031 .876+.017 
a -789 .8642%.019 .8714.018 .819+.024 927 +.009 


c. Cancellation —There were not enough parts in this test—as also in the speed 
of movement—to afford a comparison of real and theoretical reliabilities within test 4. 
In Table XXXIII the comparisons for the two tests are shown. All three predicted 
values are too high. The actual reliability lies within two probable errors of the 
‘best fit’ prediction; but it is much further away from the other two series. The 
prediction based upon the correlation of the two halves of test 4 is even greater than 
the last two. 


TaBLeE XXXIII 


CANCELLATION. ACTUAL AND PREDICTED RELIABILITIES WITH REPEATED TESTS 


Number _ Actual Predicted reliabilities 
of parts relia- _— Initial Mean of = ‘Best fit’ r from r from odds 
cumulated _ bility r 21's value halves of 4 andevens 4 
ree 623 .623 P.E. .639 P.E. .538 P.E. .772 P.E. (only 2 
ee 640 .768+.031 .780+4.030 .700+.040 .870+.017 parts) 


d. Substitution —Table XXXIV shows that the initial coefficient leads to fairly 
accurate prediction within test 4; while the mean of two correlations gives a coefficient 


that is far too large. 
TaBLeE XXXIV 


SUBSTITUTION. ACTUAL AND PREDICTED RELIABILITIES WITHIN ONE TEST 


Number of parts Actual Predicted reliabilities 
cumulated reliability Initial r Mean of 2 r’s 
WMiteawacdicsas 682 682 P.E. 768 P.E. 


Divwwesteeieess 795 812+.025 .870+.017 








100 LYLE H. LANIER 

The comparisons based upon tests 4 and B are shown in Table XXXV. The 
results are similar to those found for practically all of the tests considered thus far; 
namely, the theoretical reliabilities based upon the initial and upon the mean coefficients 
are too high, those given by the ‘best fit’ value are somewhat nearer the true relia- 
bilities, and the predictions both from the correlation of the halves of test 4 and of 
the odds and evens are much too great. 


TABLE XXXV 


SuBsTITUTION. ACTUAL AND PREDICTED RELIABILITIES WITH REPEATED TESTS 

Number Actual Predicted reliabilities 
of parts. relia- Initial Mean of _ ‘Best fit’ r from (1+2) + from odds 
cumulated bility r 47's value with (3+4)4 andevens 4 

1..... 469 .469 P.E. .495 P.E. .328 P.E. .795 P.E. .785 P.E. 

2 568 .639+.047 .6634.044 .495+.066 

Re icas S71 .7272%.041 .746+4.037 .594%.064 

4..... 545 -778+.036 .797+.032 .663+%.059 .887+%.015 .886+.015 


e. Tapping II.—In this test the predicted and the actual reliabilities agree better 
when repeated tests are used than when only one test is involved. As Table XXXVI 
implies, both the initial and the mean values give predictions that are too high when 
compared with actual reliabilities secured within test 4. On the other hand, the 
theoretical and the true reliabilities from correlations of corresponding parts in the 
two tests agree closely. The predictions based upon the mean value conform less 
closely for the several cumulations than do those of the two other series, being regularly 


too high. 
TaBLE XXXVI 
Tappinc II. AcTUAL AND PREDICTED RELIABILITIES WITHIN ONE TEST 
Number of parts Actual Predicted reliabilities 
cumulated reliability Initial r Mean of 2 r’s 
Ptaekdaeisd aeee .671 671 P.E. .708 P.E. 
istsindncnekes .658 804 +.032 .830+.028 
TaBLE XXXVII 
TappinG II. AcTUAL AND PREDICTED RELIABILITIES WITH REPEATED TESTS 
Number Actual Predicted reliabilities 
of parts” relia- Initial Mean of_ ‘Best fit’ r from (1+2) + from odds 
cumulated bility r Sr’s value with (3+4)4 andevens 4 
. 622 .§22 P.E. .s99 P.E. .486 P.E. .658 P.E. .834 P.E. 
...+. 628 .703%.051 .749+.042 .655+.058 
3..... 675 .786+.039 .818+.034 .740+.049 
, or .794 .832+.034 .856+.027 .793+.042 .795+.034 
ape 868 .863+.026 .882+.023 .825+.038 Q10+.015 


The correlations of (1 + 2)4 with (3 + 4)4, when substituted in the formula to 
find the reliability of a test twice as long as one of these sums, gives a predicted relia- 
bility practically identical with the corresponding actual value. The odd-even correla- 


tion in test 4 gives too high a reliability; although it is not so far out as we have found 
in the preceding tests. 








RELIABILITY OF MENTAL TESTS 1Ol 


Summary for the Mechanical Tests 


1. Predictions within a single test. 

The fact that only three of the tests contained enough sub- 
divisions to admit of comparisons within test -f and that in 
these three tests only one such comparison was po 
somewhat limits the conclusion respecting the correspondence 
within a single test between the theoretical and the actual 
reliabilities. For two of these tests there was an. over- 
prediction when either the initial or the mean correlation 
were used. For the substitution test the initial correlation 
gave a reliability in good agreement with the actual reliability. 
The mean of two correlations of the parts gave predicted 
values that were too high in all three tests. 

2. Predictions based on both test ./ and test B. 

a. With the exception of the second tapping test, both the 
initial and the mean coefhcients gave theoretical reliabilities 
that were too high in every case. 

b. The actual reliabilities usually fall within one probable 
error of the best-fit predictions. 

c. Only one direct comparison of the relative accuracy of 
the predictions based upon the correlations of the halves of 
test 4 with those from the correlation of the odd and even 
elements is possible. This is in the substitution test, and 
there the two values are almost identical. With a single 
exception, all the predictions from correlations of parts of 
test d are too high when compared with the actual relia- 
bilities determined by repeating the tests. 

d. Only two of the tests show any material increase in 
reliability with an increase in length (the speed of movement 
and the second tapping test). For the remaining three test 
there is no appreciable increase in reliability with increased 
length beyond the cumulation of the first two part 


Section IT. Prediction of the Reliability for an Increased Num- 
her of Subjects 
Kelley’s treatment of Gordon’s data showed that in the 


field of lifted weights an increase in the number of judge 
8 








102 LYLE Hf. LANIER 


resulted in an increase in reliability agreeing fairly closely 
with that predicted by Spearman’s formula. Our data have 
been used to test out this proposition. The method has 
already been described. ‘Tables XXNXVIII to XL present the 
results for the twelve tests here used. ‘The number of subjects 
upon which the correlations are based are 20, 40, 60, 80, and 
100. ‘Three separate correlations for 20 subjects (1.¢., the 
first, the second, and the third twenty of the first sixty 
subjects arranged in alphabetical order) were figured and the 
average of these three was the value substituted in the 
formula in each case. From these the predicted reliabilities 
for 40, 60, 80 and 100 subjects were worked out. The 
results are shown as follows. 

1. Lhe Otis Test.-—Here the correlations for the increasing 
numbers of subjects, although slightly below the mean corre- 
lation for 20 subjects, on an average, remain fairly constant 
throughout the series. They are, of course, far below the 
theoretical values. 


Taste XNXVIII 


Oris ‘Test. COMPARISON OF THE ACTUAL AND THE PREDICTED RELIABILITIES 
WITH INCREASING THE NUMBER OF SUBJECTS 


Number Actual Predicted reliability, 
of relia- substituting mean of 
subjects bility 3 r’s for 20 S’s 
20 , 4. .789 
2. .810 808 
3. 825 r PLE, 
40 522 593.039 
60.. 850 .930+.027 
ee - 794 944.021 
100 -790 954.017 


2. The Musical Abilities Tests —In five of these tests the 
actual reliability for 100 subjects is /ess than the mean of 
three correlations for 20 subjects. Only in the case of the 
intensity test is there any semblance of an increase in relia- 


bility with an increase in the number of cases. 
3. The Mechanical Abilities Tests —With the exception of 
the substitution test, reliability seems to increase somewhat 








RELIABILITY OF MENTAL TESTS I 


, , . ‘ , , , = , ° 
with each increase in the number of subjects, but in n ist 


‘ ‘ at 


is it as much as the predicted reliability. 


TasLteE XNNINX 


NilustcaL Apriuities Tests. CoMPpARISON OF THE ACTUAL AN 
RELIABILITIES WITH INCREASING THE NUMBER OF SJE 
IUTED IS THE MEAN OF 3 CORRELATIONS; P.E.'s AR 
GIVEN DIRECTLY UNDER PREDICTED rf 


\ 
ar mah fise- -* 71 te - _ I 
a , i i_—- + 
$03 .729 .316 .485 .475 .48 293 .564 .568 .702 .410 .49 


4 sac S44 S12 022 S O49 <4. F~ 723 x 7/ S26 4 ‘yf 
= .07 +.1I5 x. U5 =.§27 zr .07¢ t.15 
ty SIO .S9g eee 74 455 726 12 FOS ae | x79 ) 74! 
+.952 x .I32 t.134 r.102 z 39 S.834 
SC 90 .QGI5 .0O9 .795 335 -792 33 S35 712 .gOos 402 .79% 
+ .O4!I 313 r+.II4 r.O55 z 47 =.839 
Ic 7OI .933 37 53300 e451 O83 a | 867 OS y22 OS Bz 
x .034 t .O9S t .OQ9 r.O72 } t .OG* 


TaBLeE XL 


\leECHANICAL ABILITIES Tests. COMPARISON OF THE ACTUAL AN 
RELIABILITIES WITH INCREASING THE NUMBER OF SUBJECTS , 
TUTED IS THE MEAN OF 2 CORRELATIONS; P.E.’s art 


GIVEN DIRECTLY UNDER PREDICTED r 


20 30 761 778 75 75 
674 fy >) .298 HOS 456 H17 thf f 41 yf “4! 
gos =>) cme soy M27 
40 452 .75! 559.750 622 = .705 635 .782 S35 I 
+.102 +.108 + .105 t .Ogs t 
HO 61% S34 s7hys S25 4 6 2 f21 S34 xa 22 
+ .05 tt .055 t .053 r .07¢ t .o2f 
SO 745 572 728 Kf)? (2 xf) f,2 ide! 
+ .066 +.O7I + .0O5 + 59 
100 Ocse SQ2 726 xx a> RQ fyqf ) 
_ 35 = 39 = 5° ¥- 


ea 


These results seem to show quite convincingly that there 1s 


no such general increase in reliability as the formula yields. 


Indeed, in seven of the twelve tests the reliability for 100 


subjects is somewhat less than that for 20. Where the numbe: 
of subjects is very small (¢.g., 1 or 5, as in Gordon’s study) a1 








104 LYLE H. LANIER 


increase in the number of cases will eliminate the possibility 
of extreme scores due to accidental errors affecting the 
results so materially and the errors of observation will tend 
to cancel each other, so to speak. But there seems to be no 
justification for the view that increasing the number of cases 
will give reliability that increases in corresponding proportion. 


(GENERAL SUMMARY 


A general summary of the results is given in Tables 
XNLI and XLII, and in Figure 1. Table XLI shows the 
average amount of deviation of the predicted from the actual 
reliabilities for all the tests. The values in the table are 
absolute correlation units. Inasmuch as the comparisons of 
actual and predicted reliabilities obtained within a single 
test were incomplete for all except the musical tests, these 
results are not included in the tables. Hence the tables and 
the figure contain only the differences between actual and 
predicted reliabilities for both applications of the tests. 


TasBLeE XLI 


SHOWING AVERAGE AMOUNTS OF OVER-PREDICTION FOR THEORETICALLY DERIVED 
COEFFICIENTS BASED (I) UPON INITIAL VALUE, (2) UPON MEAN VALUE, 
(3) UPON ODDS-EVENS CORRELATION AND (4) UPON THE 
CORRELATION OF THE HALVES OF TEST Af 


Test Initial Mean Odds-evens Halves 
Ous 027 — .053 033 344 
Pitch 135 153 217 235 
Intensity B72 132 202 161 
‘Time .298 187 .208 .250 
Consonance .108 206 — .047 O74 
‘Tonal Memory 145 167 183 O83 
Rhythm. .210 131 .095 O12 
Average .178 162 .158 .136 
Tapping | O88 117 .282 .225 
Speed of M. 163 140 142 Og 
Cancellation .128 140 .230 .230 
Substitution 153 174 341 342 
Tapping I] O71 O85 O42 073 
Average .120 131 .207 .192 


Average for all tests 141 140 ASI 157 








RELIABILITY OF MENTAL TESTS I 


? 


In this table the deviations of the predicted from the 
actual coefficients appear to be less for the initial value than 
for the mean in the case of the Otis and the mechanical 
abilities tests. For the musical tests the mean coethcient 
seems to yield, on the average, slightly better predictions 
This method of combining the differences for the several tests 
tends to obscure the variations among the several tests and 
should be regarded as only a gross method of summarization. 

Table XLI shows also that when a single test is sub- 
divided and the parts correlated to give the coefficient for 
prediction, the deviations given by the two methods of sub- 
division are about the same. The odds-evens method 1 
slightly better than the method of correlating the first half 
with the second (eliminating from the general average the 
negative deviation in the case of the Otis test, for the reason 
given above on p. 87). 

It was mentioned above that, for practical purposes of 
prediction, the initial, the mean, and the ‘best fit’ methods 
are all inapplicable. They are merely means for testing out 
the Spearman law which implies that an increase in the length 
of a test gives a corresponding increase in its reliability. 
The use of these three methods requires that the test be 
repeated, and if this must be done the reliability can be 
determined directly by correlating the scores on the two 
applications. In the situation where one wishes to estimate 
the reliability of a test on the basis of one application of it, 
some such method as correlating the odd elements with the 
even or the first half with the second must be used. The 
wide deviations shown in Table XLI for all the tests (except 
the Otis, the consonance and the second tapping test) where 
the odds-evens correlation was substituted in the formula 
indicate that these methods must be employed with the 
greatest Caution. 

Table XLII contains a summary of the deviations of the 
predicted reliabilities for the successive Cumulations of parts. 
Only the averages for the three types of test are graphically 
shown (Fig. 1). The value for the Otis test is not included 
in the figure, since there was only one cumulation of parts. 








106 LYLE H. LANIER 


TaBLe XLII 


THE AVERAGE AMOUNTS OF DEVIATION (FOR THE THREE TYPES OF TEST OF THEI 
PREDICTED FROM THE ACTUAL RELIABILITIES. "THE DEVIATIONS ARE GIVEN 
BOTH FOR THE ‘INITIAL’ AND THE ‘MEAN’ PREDICTIONS 


Number of parts cumulated 
Tests 2 2 4 : 6 7 8 9 10 


oF ey , 4 » ts a + > . " } 
A. Deviations of coefficients based upon initial value 


Music S07 202 67 843 207 .2123 .227 .240 .287 
Mechan. Abil... 093 .O80 .170 .O61 
Otis... .. .027 
B. Deviations of coefficients based upon mean value 

Music O86 .OO8 .145 .132 .177 .IQI .209 .221 .20¢ 
Mechan. Abil... . 123 .124 .154 041 
Otis... , .053 
AMOUNT OF 
DEVIATION 

250 

200 

- 


0150 





7 00 og Vy 
Qa. “4 —% 
“6 AN O—o 1 =Musio Tests (initial) 
- \o o—— Music Tests (mean ) 
~050 ° 
» O----O Mecheanienl Tests (initial 
e----« Mechanical Tests (mean ) 
2000 





2 3 4 5 6 i 8 - = 
NUMBER OF PARTS CUMULATED 


Fic. 1. Curves showing amounts of deviation (in correlation units) of the 
predicted from the actual reliabilities. The six music tests and the five mechanical 
abilities tests are averaged, respectively. All deviations are positive (‘over-pre- 


dictions’). 


The values along the X ordinate in the figure are the 
successive cumulations of parts in the series, while on the Y 
ordinate the deviation of the predicted from the actual 
reliability in terms of correlation units is shown. The base 


line may be regarded as coinciding with the actual reliability 








RELIABILITY OF MENTAL TESTS 107 


of any given cumulation of parts, thus having a value of o 
deviation. We have seen that practically all the deviation 
have been positive (‘over-prediction’) and, of course, 


Lie 
averages of these for the two types of test are positive. 
The figure indicates that with the music tests both metho 
vield closer conformity of the theoretical to the actual values 
early in the series of cumulations than late. This means 


that as the tests are increased in length the ‘over-prediction’ 
becomes greater and greater. ‘The ‘over-prediction’ of value 
based upon the initial coefhcient is somewhat greater than 
for those given by the mean coefficient. 

In the tests of mechanical abilities the number of tests 
averaged at the successive cumulations of parts is not con- 
stant, since, as Table XXIX showed, the number of sub- 
divisions differed for the several tests of this group. All five 
tests had as many as two parts, four as many as three, three 
as many as four and two as many as five parts. The fact 
that only two tests are used in the cumulation of five parts 
may account for the sudden drop in the curve showing the 
amount of deviation. Except for this drop the amount of 
deviation is about the same as for the musical tests. The 
initial coefficient is the better value to use for prediction, 
judging by the first two cumulations (i.¢., the cumulation of 
two and of three parts); but for the cumulations of four and 
of five parts the mean is superior in that the amount of 
deviation is less. In the majority of instances the differences 
between the deviations of predictions based upon the two 
values, initial and mean, are slight. 

It may be well here to call attention to some of the general 
implications and applications of this study, inasmuch as it | 
hoped that the results presented will serve to stimulate a 
desirable interest in the problem of the reliability of tests, 
the second most important fact about a test to be determined, 
as Kelley puts it. Persons engaged in the construction of 
tests, particularly, might use these methods in analyzing the 
results with reference to a determination of the reliability of 
the materials which they propose to use. Such proced 
would no doubt lead to a material shortening of some of t 








108 LYLE Il. LANIER 


lengthy and tiresome tests that are now being inflicted upon 
students. 

With reference to the particular problem of estimating the 
reliability of a test by means of the prediction formula, the 
results show that it 1s not an unum necessarium for purposes of 
research in reliability of tests. In only two of the twelve tests 
used in this study (the Otis test and Tapping IT) is there any 
approach to conformity between the actual coefficients and 
the complete series of predicted values. A consideration of 
some of the factors influencing the reliability coefficients in 
these tests will probably show why the mathematical device 
is not infallible. 

The fact that the several kinds of test used in this study 
vary in reliability indicates that the conditions operating to 
determine their respective reliabilities are neither uniform nor 
constant. ‘The musical tests generally have low reliability, 
and increasing the length of the test does not contribute very 
much towards raising it. This low reliability is probably due 
in some measure to the fact that the discrimination of sounds 
is a process easily disturbed by accidental noise, inattention or 
nervousness. Ina group test, where there is always difficulty 
in controlling such factors. they are likely to influence the 
results very materially. ‘These musical tests, too, become 
very tiresome to many subjects; and at the second testing 
these persons, who were interested when the test was first 
given and who tried hard to make the discriminations, do not 
always seem to try when the tests are repeated. Indeed, 
several persons frankly told the experimenter that they came 
to ‘hate’ the musical tests; others that they were driven 
almost frantic by the monotony of the sounds, etc. The 
fact that the means for four of the six musical tests are lower 
at the second testing than at the first seems in some measure 
to confirm this view. These factors, of course, would lower 
the relative standing of such persons in the group and conse- 
quently would decrease the correlation between the two sets 
of results. Increasing the length of such a test would do 
nothing except, possibly, further to disturb the reliability by 


having more and more persons become fatigued and nervous. 








RELIABILITY OF MENTAL TESTS LOY 


It is possible, though this is not asserted as a fact, that there 
is some relationship between sensitivity to differences in 
tones and sounds and a tendency toward ‘nervousness.’ In 
this case, increasing the length of the test would only result 
in lowering the records of persons who earlier (or at another 
testing) had made good records and consequently in lowering 
the reliability coefhicient. And within a single test the fact 
that many judgments have to be made about tones or sound: 
which the subject cannot discriminate at all contributes very 
much to the distaste for the tests and may operate to engende1 
unfavorable attitudes and to predispose persons who tried 
hard at the first testing against doing their best when the 
test is repeated. 

With the Otis test the situation is quite different. An 
‘intelligence’ test is one on which individuals regularly wish to 
rank high. <A person who does not object to a low record on a 
musical test, shrinks at the thought of a low score on an 
‘intelligence’ test. As a consequence, the majority of persons 
work hard here, and will do the same when it is repeated. 
These tests are thus the most reliable that we have. Where 
two forms are used the elements are new each time and this 
may enhance the interest. 

The so-called mechanical abilities tests (speed tests) are 
somewhat more reliable than the musical tests and _ less 
reliable, on the whole, than the Otis intelligence test. They 
are so short that the factor of fatigue. does not markedly 
enter and there seemed not to be any dislike of them on the 
part of the subjects. But with such tests it is difficult to 
obtain the serious attention usually secured with an in- 
telligence test. As a consequence the subjects are more 
easily distracted than when taking the latter type of test. 
With the repetition of an intelligence test there is usually a 
desire on the part of an individual to improve his score; 
while with the speed tests there is generally a lack of interest 
or enthusiasm, an attitude that does not constitute an 
impenetrable barrier to distractions, amusement at one’s 
neighbor, etc. 

These remarks about the tests used in this study are made 








110 LYLE H. LANIER 


by way of emphasizing and clarifying the statement that the 
reliability of a particular test is a function, in a large measure, 
of the attitudes of the person tested. The data presented 
show quite conclusively that there is no general law whereby 
increasing the length of a test parallels an increase in its 
reliability. Moreover those performances in which there is 
an increase in reliability with a longer test vary in the relative 
amounts of such increase. On the whole the problem appears 
to be experimental and not mathematical. And it is an 
important problem, this one of so determining the optimal 
conditions for the presentation of test material as to 
engender favorable attitudes and thus to promote accurate 
measurements of the abilities studied. 

In psychological and educational research there is at the 
present time a distinct tendency towards substituting mathe- 
matical treatment of data for careful experimentation and 
control of conditions. Mathematical formule like the Spear- 
man which we have been studying are based upon assumptions 
of accurate and representative data. Until these are secured 
the use of such formule should be limited and the conclusions 
from them should be subjected to experimental verification. 
Trained statisticians usually are aware of the limitations 
relating to the use of their methods and of the care that should 
be exercised in the interpretation of results obtained by them. 
But the untrained individual who has merely the knowledge 
of how to manipulate the devices placed in his hands is in 
danger of straying far beyond these limits, and he usually 
does so stray, and the untrained person is not alone in this 
respect. When one reads some of the experimentation of 
Spearman, and notes how he corrects for attenuation coefh- 
cients of correlation from an exceedingly small number of 
cases and proceeds to far-reaching generalizations from the 
results, one cannot resist the notion that somehow the 
perfection existing in his hypothetical mathematical world 
has blinded him to the limitations of his experimental data. 


CONCLUSIONS 
1. Six of the twelve tests used in this study might be 


decreased in length at least one-half with no loss in their 








RELIABILITY OF MENTAL TESTS 111 


reliability. This seems to show that reliability is not a 
function solely of the length of a test and that the Spearman 
‘law’ does not have the unlimited application implied by the 
formula. 

2. For the six tests in which there was an increase in 
reliability with an increase in the length of the test, the actual 
reliability was generally far below the predicted coefficient. 

3. In general, when the predicted and the actual relia- 
bilities were secured by correlating subdivisions of a single 
test, the agreement between them was much closer than 
when the correlations were of parts of two separate applica- 
tions of the test. This indicates that the attitude of the 
subject, which may be assumed to be more or less constant 
throughout one sitting, is an important factor in determining 
a test’s reliability. 

4. The reliability coefhicient does not appear, from these 
results, to have any very close relationship to the number of 
subjects used. ‘There appears no ground for the assumption 
of a general increase in reliability proportionate to the increase 
in the number of subjects, such as the prediction formula 
gives. Careful selection of typical subjects and attention to 
experimental technique probably are more important than 
securing an unlimited number of cases. As Spearman pointed 
out in his article of 1904, ‘errors of observation’ will not be 
lessened by the repetition of them with more and more 
subjects. 

APPENDIX 
A. Brown's demonstration of the prediction formula (1). 


‘ro. measures the extent to which the amalgamated series of results of the tw 


at 


would correlate with a similar amalgamated series of two other applicati ( 
Same test. If X1, Xa, xy’, x9’, be two pairs of results x den ting, asu ial, deviations fr 


the mean value) we may assume that 


OX; = 0X, = OX; = OX; ox (sa 
and that 
S(x 1x1") — o> XxX?’ = S XX)" a XqX2 no, 
Hence we get 
NS + 3 i 7 4? 7 
r - Sie ae 7 
No 0; ni 2aq,’° ir.a 











112 LYLE H. LANIER 











It is easily seen that the amalgamation of four tests gives a reliability coefficient 
47) ; , 99 
and in seneral we have 
, ana In general we have 
. TF (A I)r 
nr 
, ; 
1+(n—1)r 
A complete derivation of this last formula is given in appendix B. From that deriva- 


tion the reader not versed in statistics can supply the steps omitted in Brown’s 
demonstration. 
B. Demonstration of the prediction formula. 

The demonstration below is similar to that found in Kelley’s Statistical Method 
(pp. 196-197), except that the standard scores are not assumed. ‘The notation, too, 
differs from that of Kelley, because the writer believes that the notation here used 
will be more readily grasped by the non-technical reader. It was mentioned above, 
and might well be repeated here, that Spearman’s formula for the correlation of sums 
is the source of the demonstrations by Kelley and by the present writer. 

Let x1, X2, °**, Xa and %,’, x2’, «++, xq’ be scores as deviations from their respective 
means, a denoting the number of tests to be summed in each of the two series of tests 
of function x. 

Then 


L(x, wis * **, Bastiat « wes ** *, Be 








Vy 2 v . ’ ° 
Vol XxX), Xo, ore, mar Ve ad 5 23, ***, Be 


But if we assume that the successive measures of the function x have the same means 


and the same standard deviations, then 


. , . , . , . , . , ? 9 
DX 1X) = Lx} X2 = ZXiXe = LXex) — TXeXe etc. = Nor; 
and 
- . ‘ . ‘ . ;2 * 72 . r2 . 4 - Sy 
Sx = Ex? = Tx2 = Fx; = Txe2 = Tx, = Dx? = Nea,’. 


As Kelley points out, for an analogous equation (7), the product of the two terms in 
parentheses in the numerator of the above equation gives a binomial of a? terms, each 
of which is of the sort x,x,’; but since all of these terms are equal by hypothesis, 
the numerator may be written a’No,*r;. In the denominator, the square of the 
first term gives a polynomial of a? terms, a of them being of the sort Ex? and the balance 
(a? — a) of the sort Dx,x,’.. The equation then becomes 


Nee ee 


NE [ax? + (a? — a)No,*r,|Z[ax? + (a? — a) No,*r)| 





aNo,r, 





aNo,* + (a? — a)No,*r, 


aNor, 





aNo/\I = 1)r;| 





REFERENCES 


1. Brown, Wa. The correlation of mental abilities, Brit. J. Psychol., 1910, 3, 
290-322. 
2. Brown, Wn., & THomson, G. The essentials of mental measurement. Camb. 


Univ. Press, 1920. 





RELIABILITY OF MENTAL TESTS 11} 


4. Gorpvox, K. Group judgments in the field of lifted ghts, J. Exper. P 
1924, 7. 398-40 

>. Hoizincer, KK. Note on the use of the S 
J. Educ. Psychol., 1923, 14, 392-305 

6. Hoizincer, K., & Crayton, B. Furt 


Crum, W.L. Note onthe reliabilit fatest, .Imer. Ma \/ 192%. 30, 29 i 


the Spearman prophecy-formula, thid., 1925, 16, 289-3 
7. Kevvey, T. L. Statistical method: Macmillan, New York, 1923 
S. Kg LLeY, T. L.. Note on the reliabilit: fa te t, J Educ. Ps ae 24, 1§, 193 rt 
9. Kevtey, T.L. The applicability of the Spearman-Brown { 
ment of reliability, th1d., 1925, 16, 300-303 
10. Rucn, G. M., Ackerman, L., & Jackson, J. D. An empi: tudy of the S 


] | } ‘ Y , ; . a ae 
ia aS applied to educational material, 101d., 1926, 17, 3JOQ-jI 


man-Brown formu 
11. SeaAsHore, C. E. Manual of directions and interpretations, et Columbia R 
Company. 
12. Suen, E. The standard error of certain estimated coefficient f 
J. Educ. Psychol., 1924, 1§, 462-465. 
13. SPEARMAN, C. The proof and measurement of a iation between ty 
Amer. J. Psychol., 1904, 1§, 72-101. 
14. SPEARMAN, C. The demonstration of formula for 1 tr 
association, thid., 1907, 18, 161-169. 
15. SPEARMAN, C. Correlations calculated from faulty data, Br J. Psychol., wt 
3, 271-295. 
16. SPEARMAN, C. The correlation of sums and differences, thid., 1913, §, 417-420 
17. Woop, B.D. Th li 
Board examination in algebra and geometrs Published by the B 
Id. Woop, B. D. Studies of achievement test , pt. in, J. Edu r | I 


263-209. 


Lt" ; wa , 
. lat ret ciate le. 4 ¢ . i }: teas | . , . 
e€ reliadilitv and GUC UILY ©] re ( bic’ pre n i! La i 











APPARATUS 
A ConveENIENT MuirRROoR-DrRAWING DEVICE 


BY HARVEY C. LEHMAN AND PAUL A. WITTY 


University of Kansas 


Mirror-drawing simply and clearly demonstrates certain 
essential elements in habituation and learning. It involves 
learning of a relatively simple, trial-and-error type; again, the 
procedure is definite and precise, enabling groups to partici- 
pate simultaneously without confusion and misunderstanding; 
and, finally, the results of the experiment are applicable to 
many situations, clearly illustrating various principles to be 
acquired by the student and affording the instructor an 
excellent device for inductive teaching. 





Fic. I. 


The chief obstacle encountered by the laboratory in- 
structor in presenting the experiment has been the clumsy and 
awkward character of the apparatus generally used. It has 
been necessary at times to assemble and dismantle the 
various sections of the apparatus for each experiment, a time- 


114 

















{PPARATUS 11s 


consuming and unnecessary process. Some instructors have 
devised permanent set-ups. But these ever-ready devices are 
often badly proportioned and unduly large. As they cannot 
be readily stored, they are dust-collectors which must alway 
be cleaned before use. 

Experience with these various inconveniences has led 
the construction of the collapsible, all-metal mirror-drawing 
apparatus illustrated in the accompanying photograph 


* 
’ 
Xe. 


~ I  enn  _—g 





The first cut shows the device in use. The experimenter 


sees the reflection of his hand in the mirror by glancing 
through the aperture cut in the shield. The shield is held 
in a horizontal position by a wire hinge (see the second cut); 
and it revolves in a perpendicular plane when the apparatus 
is folded. The mirror is constructed of polished metal. 
Nickel zinc is used for the reflecting surface; but any highly 
polished surface would do. When the experiment is com- 
pleted, the hinge may be revolved and brought to rest against 
the mirror. The shield is thus permitted to drop down against 











116 HARVEY C. LEMWMAN AND PAUL 4. WITTY 


the surface of the mirror. The basal plate is so constructed 
that it rests behind the mirror when folded flat. 

When in use the mirror is prevented from falling backward 
by the weight of the shield, which causes the mirror to 
balance forward about 5 degrees. A slug is soldered to the 
base, serving the additional purpose of holding the star- 
pattern in place during operation. Since the slug presses 
against the base, the star-pattern is placed in position by 
first tipping the mirror slightly backward and then slipping 
the star-pattern beneath the slug. When the mirror is 
allowed to fall forward the weight of the mirror binds the 
paper star-pattern in place. 

The apparatus herein described may be constructed with 
numerous modifications; ¢.g., 1t is possible to construct the 
mirror by using one thickness of metal only, instead of 
setting it in a frame of tin. It is inexpensive; it has no glass 
mirror to become spotted, chipped or broken; and it is 
compact when folded. (Many can be stowed away in an 
exceedingly limited space.) No thumb-tacks or pins are 
needed for holding star-patterns in place; there is no wooden 
surface to become uneven and ineffective from thumb-tack 
imprints, and it is in one single piece. If the mirror eventually 
becomes dull, its luster can be easily renewed. The device 
is quickly assembled and dismantled. ‘To adjust the star- 
pattern it is only necessary to push the mirror-surface back 
and to slip the pattern beneath the binding strip at the base. 
The mirror-surface comes forward to its natural position and 
the binding strip holds the star-pattern in place. The 
apparatus is then ready for use. 











THE RELATION BETWEEN PHYSIQUE AND 
PERFORMANCE 
BY GEORGE J. MOHR 
Institute for Juvenile Research, Chicaco 
and 
RALPH H. GUNDLACI 
University of Illinois 
During recent years numerous studies of the abnormal 
have centered about the relationship of bodily habitus to 
mental disturbance. A number of investigators, including 
Kretschmer,! have undertaken to show that a marked corre- 
spondence does exist between the various forms of physical 
build and mental disturbance. Kretschmer specifically con- 
tends that the ‘pyknic’ habitus is characteristically associated 
with the manic-depressive psychosis, while the ‘ athletic’ and 
the ‘asthenic’ varieties of habitus are clearly associated with 
the schizophrenic forms of insanity. Not only does a corre- 
spondence of physical build and psychic structure hold true 
in the field of the abnormal but a similar correspondence may 
also be determined—as he thinks—among normal individuals. 
Kretschmer’s studies have led him to believe that at least 
two ‘temperamental types’ may be determined among the 
normal population and that these two types are related in 
character to the two major groups of insanity. While the 
manic-depressive insanity is typically characterized by a 
cyclic variation in mood, so that the patient may pass from 
a state of extreme depression to one of maniacal excitation, 
among the normal there are many individuals who show this 
same tendency, albeit the fluctuations remain within much 
narrower limits. These individuals Kretschmer considers as 
of the cyclothymic disposition or temperament. In contrast 
is that large group who evince in minor degree the charac- 
teristics that under pathological conditions become the 
symptoms of the schizophrenic disorders. ‘The well-developed 
symptoms of schizophrenia, such as emotional blunting, 
disorders of attention and disturbance in the associative 


1E. Kretschmer, Physique and character (Eng. trans. from 2nd German ed. by W. 
J. H. Sprott). New York & Washington, 1925. 





117 
9 








118 GEORGE J. MOHR AND RALPH H. GUNDLACH 


processes, may find their correlatives in diffidence, pre- 
occupation and self-centeredness of certain normal indi- 
viduals. The latter, according to Kretschmer, are of the 
schizothymic temperament. 

Kretschmer further holds that the correspondence of 
certain types of physique associated with these temperaments, 
noted among insane patients, may also be demonstrated 
among the normal. He describes the asthenic, athletic, 
pyknic and dysplastic bodily forms. The asthenic individual, 
said to be of the schizothymic temperament, is one who is of 
average height but is relatively tall for his weight. He is thin, 
with a long narrow shallow chest. His shoulders are relatively 
broad contrasted with the diameter of his chest. His muscles 
are thin and poorly developed, his skeletal structure is slight. 
The skin is thin and loosely attached to the underlying 
tissues. The face is characteristically long and narrow, with 
a prominent nose and clear-cut features. The facial angle is 
sharp and the mid-face is relatively long. 

The athleiic physique is similar to the asthenic in general 
bodily proportions but all of the structures are thicker, 
firmer and of more robust development. The shoulders are 
heavy, the chest is broad and of medium depth. The skeleton 
is heavily built. The muscles are thick, of good tone and 
are well contoured. The skin is thick and closely adherent. 
The face is relatively long and narrow, with proportions 
similar to the asthenic, but with thick though well-defined 
features. The facial angle is less marked than in the asthenic 
and the lower jaw more heavily developed. The athletic 
build is considered a variant of the asthenic, and is associated 
with the schizothymic group. 

The pyknic habitus is described as one in which there is an 
increase in the volume of all of the body cavities. The 
head is large, the chest is voluminous and exceptionally 
broad and deep. Although the shoulders are of moderate 
width they appear narrow contrasted with the broad chest. 
The abdomen is full. The skeletal structure is slight when 
compared with the general bulk of the individual and the 
extremities are relatively small and slender. The hands are 








RELATION BETWEEN PHYSIQUE AND PERFORMANCE 119 


small and delicate There is a generous adiposity and the 
skin is thick and firm. The face is round and the midface 
is short. The complexion is ruddy. This bodily form is 
associated with the cyclothymic disposition. 

The dysplastic type includes many deviants from the 
normal. In this group are those physical forms evidencing 
disturbance of the various ductless glands. The elongated 
form of the eunuchoid, hypoplastic forms and those in which 
there have been localized developmental disturbances, are 
included. 

In addition to these four groups there are mixtures 
of the various types and certain forms that fail to fit into 
the classification. On the basis of his observation of 260 cases, 
Kretschmer reports that 70.3 per cent. of the schizophrenic 
patients present physical forms corresponding with the 
asthenic and athletic types and that only 2.9 per cent. of 
these patients present a pyknic habitus. Of the manic- 
depressive patients 84.7 per cent. presented pyknic build 
and only 10.6 per cent. the athletic-asthenic form. From 
these findings, together with observations among normal 
individuals, and from characterologic studies of families of 
normal and psychotic subjects, Kretschmer develops his 
entire theory of temperament and the association of tempera- 


TABLE I 


THE TEMPERAMENTS ? 


Cyclothymes Schizothymes 

Psychesthesia and mood....Diathetic proportion: Psychzesthetic proportion: 
between raised (gay) between hyperzsthetic 
and depressed (sad) (sensitive) and anasthetic 

(cold) 

oo Wavy temperamental Jerky temperamental curve: 
curve: between mo- between unstable and tena- 
bile and comfortable cious alternation in mode 

of thought and feeling 

Psychomotility ............ Adequate to stimulus, Often inadequate to stimulus: 
rounded, natural, restrained, lamed, inhib- 
smooth ited, stiff, etc. 

Physical affinities.......... Pyknic Asthenic, athletic, dysplastic, 


and their mixtures 
* From Kretschmer, op. cit., p. 258. 








120 GEORGE J. MOHR AND RALPH H, GUNDLACH 


ment with physical build. The theory of types is summarized 
(see Tables I and II) by Kretschmer, who declares that the 
formulation is the “direct empirical result” of investigation. 


TABLE II 


SPECIAL DISPOSITIONS ® 


Cyclothymes Schizothymes 
tithrdctbnenvennetedeatanl Realists Pathetics 
Humorists Romantics 
Formalists 
NS on tac emiewnneaee Observers Exact logicians 
Describers Systematists 
Empiricists Metaphysicians 
ee ere Tough ‘whole-hoggers’ Pure idealists 
Jolly organizers Despots and fanatics 
Understanding concili- Cold calculators 
ators 


Many investigators * have endeavored to determine whether the physical types 
can readily be identified, and, further, to clarify the possible relationship of these 
physical forms to the insanities. Apparently there has been little difficulty in the 
actual identification of the individual types, although there is a wide divergence 
among the investigators in the matter of the incidence of these physical types among 
insane patients. Although there is a general agreement that in the manic-depressive 
group a greater incidence of the pyknic build is found, there are reports of presum- 
ably typical cases of the manic-depressive insanity clearly not associated with the 
pyknic habitus. Gruhle has recently brought together the findings of a large number 
of investigations and has summarized the reports in a number of charts which are in 
part here reproduced. 

The disparity between the findings of the various investigators may, in part, 
be dependent upon divergent criteria used in selection of the groups. The investi- 
gators are perhaps not equally clear as to what constitutes pyknic or athletic-asthenic 
build. This is to be expected since the groups are defined only in descriptive terms. 
Despite this, a relationship is shown between the cyclic insanities and the pyknic 


?From Kretschmer, op. cit., p. 261. 

4H. W. Gruhle, Der Korperbau der Normalen, Arch. f. Psychiat. u. Nervenkr., 
1926, 77, 31; Kolle, K., Der Korperbau der Schizophrenen, 1bid., 1924, 72, 40-88; 
Korperbauuntersuchungen an Schizophrenen, tbid., 1925, 75, 21-62; Korperbaustudien 
bei Psychosen, tbid., 1926, 77, 115-150; F. v. Rohden, Korperbauuntersuchungen an 
geisteskranken und gesunden Verbrechern, td1d., 1926, 77, 77-151; F. v. Rohden und 
W. Grundler, Ueber Korperbau und Psychose, Zsch. f. d. ges. Neur. u. Psychtat., 1924, 
93, 37-78 (This article contains fairly extensive bibliography); F.Mauz, Ueber Schizo- 
phrene mit pyknischen Korperbau, tbid., 1923, 86, 96-122; Krankheitseinheit und 
Mischpsychosen (part ii), idid., 1926, ror, 15-35. 

§K. Kolle, Klinische Beitrage zum Konstitutionsproblem, Arch. f. Psychiat. u. 
Nervenkr., 1926, 77, 183-238; 1926, 78, 93-164. 


-~j) OD ) FT 2 


& * 


: z 
» Se aoe 5 = une seme sm ete 


_ 


Zz, 


- A 


ro 


—~— 
PY 


VLJYD Sc"H 





I 


AND PERFORMANCE 


. 
4 


RELATION BETWEEN PHYSIQUE 








RMAL 
HOVATHU 


ame | 


7 p 3 be 
RS OC 


Lo 


Ce 


> es 2 se 


> ‘ 
Rae ee eee eee 


KS _—.eeO a | 
‘ 


ti Ml in li i Aili ltl 
KS eS "| 


‘ 
ee | 


- 
| 


omoBiifil ti 


= TT) 


ad hk 


—s 


OVHMRENY 


SCHMIZ 


“IIXED 


_ 


= 


TradNa Tan 
WWiva 
1aOC LR 


TIHONTH 

(- ITIOW 
VWWIV1d 
HOS- ITIOW 
TTHL a9 

M THOM 


NAAN 


VAIAITO 
qTICcyT qt 
LJ - HO Jw{ 


V-NICHOOA 


THTANICSIM 


AAOHNZT1Oi« 


HOS TAM 


IN-1TOJS 


MAWTHIGLAMM 


LAM ONGTH 


artte Hw oOVUNL 





Pure pyknics in their distribution in the groups 


Cuart I. 











adoclLa™ 


LOSOK 14d A 


M -TAHOUA 


TAM ONAN 


l wnar FI0N 


HOG - TT1OM 
AaAdIANHO 
HSIAO 

row tev (87. | 

We roils 
INW1DtYadA 
ATHAUD 

M - TAHOMA 
W - FONT 
HOSTLAM 

5 - NAGHOW A 
4d40RNITION 
ISauoH aad A 
TaAWHOe La 


TAX) NAH 


avrg 


AriTes mw 


d forms in their distribution in the groups 


pes and their mixe 


Pyknic ty 


Cuart II. 











| 









































qraaNasctTam 
lZMONTH 
HOCTAM # 


iM - TAH 


LOMOH GA 
TACANHOS 
asisaod 
aalvArlo 

INV L2TLA 
VWTVd 

W- dower 
aTHNAD 

HOG - TTIOW 
‘Ta YONTH 
WNal- aTIOW 
CTAANASSTaM 
/\ -TaH AM 
4IOHNATION 
W-Iols . 
INWLI4A 
‘D-NAaCHOd A 
LSaMOH GA 
VINTVG 
aaWHOSL rH 
waSOW-dONT 
HOCTAM 


Asthenic-athletic types and their mixed forms in their 
distribution in the groups. 


Cuart III. 





122 GEORGE J. MOHR AND RALPH H. GUNDLACH 


build, and a relatively greater incidence of the athletic-asthenic builds is found among 
schizophrene patients. 

Gruhle’s* own study of 118 normal unselected individuals distinguishes the 
physical forms described by Kretschmer and shows their incidence among his normal 
group. After a careful scrutiny of the several physical attributes upon which a di- 
agnosis of each type is made, he concludes that the pyknic type is the ‘best type’ 
inasmuch as there are more definite features that tend clearly to define the group. 

Van der Horst? and Oliver ® note that at times the athletic and the asthenic 
types can be distinguished only with difficulty. Kretschmer ® has recently included 
these two forms and their mixtures in the single classification of the leptosome group. 
The athletic and asthenic forms are here considered as extreme variants of a single 
type and their significance in relation to temperament is presumably the same. In 
view of the uncertainty as to the exact status of these two groups, we would call 
attention to a suggestion made by Gruhle !° that an individual may embody consti- 
tutional factors tending toward production of the asthenic or pyknic type, but that 
certain phases of his environment, such as those determined by his occupation, will 
modify or accentuate these tendencies. He also states that there are relatively few 
subjects of pyknic build among younger individuals and points out that an older age 
must be attained before the characteristics of this build are well developed. 

Kretschmer himself does not seek a too clear-cut distinction between the physical 
types he describes. His conception is as follows: “The important idea about a 
type is that it possesses a firm center but not hard and fast boundaries. Types as a 
rule can only be determined intrinsically; we cannot mark their boundaries. By 
‘type’ we mean a nucleus of more distinct and among themselves quite firm formations 
which have been deliberately lifted out from a sea of progressive transitions. This 
holds good for a racial type as well as a personality or a clinical reaction type.” 


We find, then, that the distinctions among the physical 
types are quite variable and inexact since they are based 
largely on descriptive characterizations; but that a positive 
relationship between physical structure and psychosis has 
been found to exist by a number of observers. In reference 
to the relationship between bodily habitus and normal 
temperament Kretschmer did not investigate the physique of 
normal subjects. He studied historic figures whose physical 
classification was determined by their likenesses and whose 
temperaments were well known or determined by their 
creative works, family histories and descriptive accounts of 
the individuals. 


© Op. cit. 

7 Van der Horst, Experimentelle-psychologische Untersuchungen zu Kretschmer’s 
‘Korperbau und Charakter,’ Zsch. f. d. ges. Neur. u. Psychiat., 1924, 93, 341-380. 

®H. G. Oliver, Der Korperbau der Schizophrenen, idid., 1922, 80, 489-498. 

® Quoted by Van der Horst, op. cit., p. 342. 

10 Op. cit., p. 24. 

1 E. Kretschmer, Hysteria. (Eng. trans. by O. H. Boltz.) New York, 1926. 








RELATION BETWEEN PHYSIQUE AND PERFORMANCE 123 


Van der Horst * working with both normal and insane 
subjects reports differences in performance on a number of 
objective tests which he explains on the basis of differences 
inherent in the various physical types. The reported differ- 
ences in performance were clear-cut and striking; and certain 
of the tests were in part adopted in our own investigation. 
The details of his findings will be given along with the con- 
sideration of our results. He measured differences in speed of 
rotation at which color-fusion occurred, reaction time, tap- 
ping rate, and range of attention; and he uniformly reports 
differences which give no overlapping of the leptosome with 
the pyknic group. Moreover, the differences between the 
physical types were more marked than those between the 
insane and the normal subjects. 


STATEMENT OF THE PROBLEM 


The clear delineation of types in many of the German 
studies seems remarkable in view of the fact that psychological 
‘types’ are not usually found, but only a more or less sym- 
metrical distribution of cases. If by some purely objective 
estimate of physique any tendency towards bi- or tri-modality 
could be found, this would indeed be significant for a definition 
of types of physical structure; but for a study of the effect 
of physique on performance one could compare individuals 
selected from the extreme cases of a distribution of physical 
build. Any differences found there would be a function of the 
build, and would not be dependent upon the existence of more 
or less definite ‘types.’ 

It is obvious that physique does determine certain kinds of 
performance, as is instanced by the types of men who excel in 
certain athletic events. The tall slender (asthenic) man 
excels in running and perhaps jumping, the more robust 
(athletic) in field events and boxing, while the heavy (pyknic) 
type in tug-of-war, or in line positions on the football team. 
Although these are differences in performance related to 
physique they are but slightly significant in reference to 

2 Op. cit. 





124 GEORGE J. MOHR AND RALPH H. GUNDLACH 





performances involving more especially the central nervous 
system in its higher or more complex functions. 

Physique itself is dependent ultimately upon the body 
chemistry. As Kretschmer has stated “it is an empirical 
fact that the endocrine system has a fundamental influence on 
the mentality, and especially on the temperamental qualities, 
and it has been demonstrated with reference to the thyroid 
gland by the medical examination of cretinism, myxoedema, 
cachexia strumipriva, and exophthalmic goitre, and with 
reference to the interstitial glands by the experiments in 
castration. . . . It is not a great step to the suggestion that 
the chief normal types of temperament, cyclothymes and 
schizothymes, are determined, with regard to their physical 
correlates by similar parallel activity on the part of the 
secretion by which we mean ... the whole chemistry of 
the blood.” = Evidence of a relationship between physical 
structure and performance has been noted in divergent fields, 
in gross physical performances, in the effects of the endocrine 
glands, in physical habitus characteristic of different in- 
sanities and in differential tests on normal and insane subjects. 

The purpose of the experiments presently to be described 
was to ascertain, first, whether the various physical types 
could be determined among a convict population; then, with 
a selected number of typical men, or with the extreme cases 
in a general distribution, to determine, secondly, the differ- 
ences in performance on a large battery of tests. The tests 
were selected to show any affinity which might exist between 
athletic-asthenic subjects and the schizothymic temperament, 
and between the pyknic subjects and the cyclothymic temper- 
ament. Such a battery is hard to devise, since temperament 
has not yet been objectively defined or measured. The 
tests which van der Horst found so successful were used as a 
nucleus, since the normal and insane subjects of the same 
physical build showed marked similarities in performance. 
It was thought advisable to use his tests, that his work might 
be checked with subjects of different nativity and environ- 
ment. 


13. Kretschmer, Physique and Character, 254-255. 








RELATION BETWEEN PHYSIQUE AND PERFORMANCE I 


te 
Ww 


THe ExpERIMENTS AND THE RESULTS 


A. Preliminary Survey. Our subjects were selected from 
among the American-born inmates of the Illinois State 
Penitentiary at Joliet. In a preliminary study in November 
and December, 1925, 254 men were physically examined, 
described and measured in accordance with the constitu- 
tional scheme given by Kretschmer.'* The preliminary 
work demonstrated that the physical types could be se- 
lected with some degree of assurance from among the 
men. It was noted that the incidence of these types corre- 
sponded quite closely with that given by Kretschmer for 
schizophrenic patients. Since disturbances of association and 
attention are most strikingly present in the schizophrenic 
patients, the groups were investigated from that standpoint. 
The Kraepelin Attention Test’ and the Kent-Rosanoff 
Word-Association Test were used; but these tests failed to 
distinguish the groups. Inasmuch as there were found but 
25 men of pyknic build in the 254 cases, it was decided to 
obtain a larger group of this type for purposes of a more 
thorough comparison with an athletic-asthenic group. 

In accordance with this plan a total of 600 men were 
subjected to examination and from these go were eventually 
selected. This number was approximately evenly divided 
between the pyknic and asthenic-athletic groups. The 
physical measurements were completed between April and 
June, while the psychological measurements were completed 
by the first of August, 1926.’ 

B. Selection of Subjects. Native-born white prisoners only 
were examined. ‘The procedure consisted in taking the actual 

4 Tbid., p. 11. 


16 FE. Kraepelin, Clinical psychiatry, p. 106. (Abst. from 7th Ger. ed. by A. R. 
Diefendorf.) New York, 1907. 

16 We are indebted to Dr. Walter B. Martin, Mental Health Officer of the Prison, 
who made it possible for us to work at the institution, gave us the use of his office 
and files, administered the group-test and provided valuable clerical assistance; to 
Warden Green, who permitted us to work within the prison and arranged for subjects 
and assistants; to the convicts who cooperated with us as subjects; to the Depart- 
ments of Psychology of the University of Chicago and the University of Illinois and 
to Northwestern University for the use of apparatus. 








126 GEORGE J. MOHR AND RALPH H. GUNDLACH 


physical measurements followed by a descriptive account 
made in accordance with a modification of Kretschmer’s 
Constitutional Scheme. At the close of the examination the 
individual was typified as belonging to one of the three 
classes, or to a mixture of these, or placed in an unclassified 
group. This latter group included the dysplastic forms. 

The physical measurements made included all those 
utilized by Kretschmer in defining the physical types. All 
circumferences were measured with a steel cm-tape. Height 
and span were obtained by means of centimeter scales 
fastened to the wall. Diameters, such as shoulder-width and 
pelvic-width, were measured by ordinary calipers and read 
by means of a steel tape. The average of three measure- 
ments for each of these diameters was recorded. Head 
diameters were measured by means of a craniometer and face 
measurements were obtained by use of a small caliper. 
The first 254 subjects examined were classified as follows. 


TaBeE III 
INCIDENCE OF PHYSICAL TYPES AMONG CONVICTS (SUBJECTIVE CLASSIFICATION) 
No. Per cent. 
iss chek ha ebeen Keehn eeeene sawn 45 17.8 
Seep ebbAeehAch eens resets webaceus 93 36.6 
FTC CT COTTE TEST CTCL eT TT eT 20 7.9 
tad bana RK eh ekedeedsinaredyeneces 25 9.8 
Ch 66s tbetei echoes aepecesndeeieesacsesa 27 10.6 
Dysplastic and Unclassified................006- 44 17.2 
Re ene ee er re re ere ee 254 


The large number of athletic-asthenic and mixed pyknic 
subjects is evidence that we are dealing not with sharply 
defined groups but with the two extremes and a central 


group. 

The distribution of types for the total number of men 
inspected is not included since during the later examinations 
the complete schedule was not carried out, but only those 
individuals showing definite pyknic characteristics were sub- 
jected to complete measurement. 














RELATION BETWEEN PHYSIQUE AND PERFORMANCE 127 


The criteria for determination of physical types are such as to involve application 
of two distinct selective processes; first, a subjective or clinical judgment, and secondly, 
exact physical measurements. The first of these is dependent upon the examiner's 
judgment rather than upon purely objective criteria. This subjective procedure is 
essential, however, because there is no other convenient method for eliminating from 
the groupings certain individuals who betray obvious abnormalities that would not 
be apparent on physical measurement. For instance, a hypo-pituitary individual 
might very well by virtue of his adiposity yield measurements that would have him 
conform to the pyknic type; an acromegalic might fall within the limits of athletic 
measurements. Such individuals would readily be characterized properly on in- 
spection but might very well fail to be eliminated by gross measurements. Further- 
more, such characteristics as vasomotor instability, hair distribution, thickness, 
smoothness, color and turgor of the skin, are not readily expressed in purely objective 
terms, although exact anthropometric technic tends.to approach objectivity. The 
second selective process, the application of physical measurements, is more specific 
and less subject to error provided definite standards are available. Unfortunately 
such standards are as yet in the process of being evolved. They probably vary 
among different peoples and there are today insufficient recorded examinations to 
provide standards among an American population. Average measurements given by 
Kretschmer and other investigators are available; but it is uncertain how strictly 
applicable these are to an American group. Most of the examiners do not record 
the variations about the given means. Certain relationships of measurements, such 
as the ratio of shoulder to chest measurements, the Rohrer * index, the Pignet index,!’ 
and the ponderal index,'® are available for comparison. 


A number of distributions of the physical measurements 
for these groups were made. It was found that the pyknic 
group exceeded in chest, abdomen and hip circumference, with 
the athletic intermediate and the asthenic smallest. Weight 
gave the same order of distribution, but height ranked the 
athletic first, asthenic intermediate, and pyknic smallest. To 
obtain a distribution based on physical measurements such 
that there would be little overlapping, an index of physical 
build was obtained by determining the sum of the measure- 
ments of the three circumferences and of the weight; this 
sum was divided by the measurement of the height. A 
distribution of this index was made and overlapping cases 
were arbitrarily discarded, leaving thus the extremely small 
indices for the asthenic, the large ones for the pyknic and the 
narrow intermediate range for the athletic subjects. The 
range and number of individuals included and rejected for 
each of the groups is as follows: 


16K. Kolle, Korperbauuntersuchungen an Schizophrenen, Arch. f. Psychiat. wu. 
Nervenkr., 1925, 75, 26, 58-60. 

17 Same, Der Korperbau der Schizophrenen, tbid., 1924, 72, 58. 

19S. Naccarati, Arch. of Psychol., 1921, no. 45. 








128 GEORGE J. MOHR AND RALPH H. GUNDLACH 


TaBLeE IV 


THE RANGE OF INDEX OF BUILD FOR 188 MEN CLASSED AS ASTHENIC, 
ATHLETIC, OR PYKNIC 











Selected Group Rejected Group 

Range No. Range No. 
DE icsacrcerieseeeseuas 17.2-18.4 23 18.2-19.9 20 
Se rere 18.5-19.6 34 18-18.8; 19.4-22.7 55 
EE. 0000 seeeersawkoar eens 19.9-24.2 44 18.4-19.9 12 

















This index provides a unilinear measure of physical build 
which may be utilized in the more precise determination of 
relationships within the groups by the method of correlation.!® 

Using van der Horst’s and Kretschmer’s averages in a 
similar index we find as follows. 


TABLE V 


INDEX OF BUILD DETERMINED FOR KRETSCHMER, VAN DER Horst 
AND Jouret MEAN MEASUREMENTS 




















Van der Horst 
Kretschmer Joliet 
Normal Patient - 
Rs.» «civ seueeus band 17.45 17.92 
Eee 19.15 19.02 
i dasscesseseens 19.3 18.75 
EES ere ee 20.2 21 21.3 21.33 








A simpler differentia was found in the ratio of height to 
weight. When height is plotted against weight, the division 
line between athletic and pyknic on the 180-centimeter height 
abscissa lies at 74 kg and on the 165-centimeter abscissa at 
62.8 kg. The division between the asthenic and athletic on 
the same abscisse crosses at 67 and 57 kg. 

Among the 87 men ruled out by the index, 25 would have 
remained with the groups on this criterion. The index 
correlated with weight divided by height, for the 89 subjects, 


? The correlations involving the index must be considered with caution since, 
due to the method of selection, the distribution of cases is not normal, but too sparse 
in the central areas. 














RELATION BETWEEN PHYSIQUE AND PERFORMANCE 129 


was .947. With a total of 174 cases who had been classified 
as belonging to these three distinct classes the correlation 
was .865. 

Van der Horst combined Kretschmer’s asthenic and 
athletic groups into the leptosomes, because he felt that they 
were not distinct types. We have made a double compromise, 
following van der Horst, in that we take only as many asthenic 
and athletic subjects combined as pyknic, but we follow 
Kretschmer by distinguishing athletic and asthenic groups. 

A comparison of the physical measurements for Kretsch- 
mer’s group, van der Horst’s normal and insane groups and 
for the Joliet subjects is to be found in Table VII.” 


TaBLeE VIA 


PHYSICAL MEASUREMENTS OF THE SELECTED MEN 
A sthenic Group 


Weight in kg; other measurements in cm 





























Abdom- 
No. Height Weight Chest Circ. inal Hip Circ. 
Circ. 
9904 181.2 64.5 85.2 83.9 91.4 
9894 172.2 59-9 87.4 76.2 89.5 
9809.. 157.1 45-4 81.3 66.6 78.2 
85.. 175.8 63.6 82.2 76.7 85.5 
107.. 169.7 60.0 85.4 73.9 87.7 
123 159.5 51.3 83.7 73.8 82.7 
165 175-5 62. 86.5 80.3 89.0 
193 173.2 56.3 83.3 74.2 84.1 
8212 178.1 59-4 86.3 77.2 86.7 
8950 170.0 55.6 79.5 76.5 86.6 
170.0 §7-0 $2.6 76.0 BH.1 
9687 178.3 63.6 88.7 76.0 8o.8 
9739 191.5 74-0 94.2 78.5 97.2 
9753 171.0 56.1 RRs 75.3 86.4 
9799 183.5 67.7 89.6 83.7 91.6 
9823 173.0 63.2 87.2 76.2 87.5 
53 169.7 59.9 89.7 73-5 87.5 
47 169.4 59.9 87.7 77:3 87.5 
19 167.0 54.5 ‘84.6 75.8 Ke 
Mean. 172.9 59.6 85.9 76.4 87.4 
S.D... 7.72 6.07 3.43 3.70 | 3.79 





20Tables VI and VII differ in means for the various averages. The slight 
differences are due to the fact that Table VI includes only the finally selected men, 
while Table VII includes all those originally placed in the respective groups on sub- 
jective classification. Further, the pyknic group given in Table VI was augmented 
by men not included in Table VII. 








130 GEORGE J. MOHR AND RALPH H. GUNDLACH 











TaBLeE VIB 
PHYSICAL MEASUREMENTS OF THE SELECTED MEN 
Athletic Group 
Weight in kg; other measurements in cm 
Abdom- 
No. Height Weight Chest Circ. inal Hip Circ. 
Circ. 

| ee 179.0 73.0 96.2 81.7 89.6 
eee 172.4 67.2 92.0 74.1 89.3 
| ee 177.4 68.6 95.2 80.1 92.1 
ee 172.0 67.6 95.0 72.0 92.1 
Pere 172.0 63.9 88.5 83.4 89.9 
ee 170.4 66.6 93.5 83.0 90.9 
Se 170.5 64.0 93.3 78.2 91.7 
ee 170.2 62.0 96.6 74.4 88.6 
eer 172.2 _ 65.1 92.2 80.0 92.1 
are 165.2 62.7 88.5 75.4 89.1 
eee 184.9 70.9 89.0 83.5 84.7 
err 168.0 65.4 96.0 75-4 91.8 
a 173.7 65.4 92.3 83.1 91.7 
175.4 70.5 95.8 78.6 91.0 
eee 174.9 67.2 95-7 77.1 89.5 
ey 179.7 73.5 94-9 83.1 96.3 
ee 176.8 69.0 91.9 85.0 94.7 
9768......... 170.5 62.4 93-5 82.7 86.5 
| ere 181.2 70.1 94.5 78.0 92.5 
are 175.9 66.4 90.5 84.6 91.7 
BROS. ccvccces 182.5 73.0 97.2 83.8 95.8 
| 170.3 61.3 92.8 76.5 91.8 
ae 174.0 67.6 91.7 82.1 92.5 
 aerres 171.9 63.6 93.2 81.0 88.2 
Ca 173.5 65.4 93.2 76.1 88.5 
a 173.5 67.3 95.0 78.5 93.6 
ee 174.2 66.9 93-4 79.7 91.0 

7a 4.43 3-35 2.36 3.60 2.57 




















C. Performance. The psychological results derived from 
a prison population are not precisely comparable to those 
derived from a normal group even though the physical 
measurements compare closely with the measurements made 
in the Army and the intelligence scores give a result but little 
below the ‘Army’ curve. The prisoner’s attitude and outlook 
modify tremendously the conduct of the individual during the 
tests. 


Rarely could there be found active codperation from the subjects; while in- 
difference shading over into hostility developed in a number of cases, especially when 
the men were several times recalled. At the beginning of each psychological experi- 
ment it was briefly explained to the subject that the performance had nothing to do 
with his prison record and his cooperation was requested. Nevertheless, two men 








RELATION BETWEEN PHYSIQUE AND PERFORMANCE 131 


TaBLeE VIC 


Pyknic Group 


PHYSICAL MEASUREMENTS OF THE SELECTED MEN 


Weight in kg; other measurements in cm 











Abdom- 
No. Height Weight Chest Circ. inal Hip Circ. 
Circ. 
9627... 166.8 68.3 99.0 92.2 90.0 
go.. 161.3 65.9 94.0 86.0 92.5 
Pe 171.0 79.1 97.1 104.0 96.8 
eee... 169.2 79.0 104.5 107.0 94.3 
167.. 158.5 60.4 93.7 84.2 88.6 
176. 172.9 76.8 99.5 94.1 97-5 
178. 168.2 68.4 96.3 84.9 93.2 
9625. 170.9 79.6 99.3 96.7 100.6 
9629. 171.0 84.1 99.5 101.9 105.0 
9643. 161.5 80.4 103.5 109.5 95.7 
9679. 174.7 90.3 106.4 108.0 106.5 
9735 177.0 73-4 101.0 88.4 94.0 
9762 171.0 70.5 101.0 86.5 96.5 
geeG......... 168.5 67.2 97.3 85.1 95.1 
Sica andbore 167.2 66.0 95.5 84.2 91.8 
9961. 165.5 79.0 96.7 94.0 97.0 
9946.. 175.0 87.7 111.0 92.5 103.5 
61.. 168.5 65.9 97.2 82.8 94.1 
23.. 170.7 76.3 96.0 83.0 100.2 
9g199.. 174.2 79.8 95.0 91.0 96.2 
eer 170.5 73-7 90.4 83.5 97.5 
386. 165.5 79.7 97.9 92.8 94.8 
Se 171.8 75.8 97.8 88.2 97.8 
| er 166.0 68.1 96.6 84.6 93.7 
6 iieae gn 172.5 69.9 97.5 85.7 93.0 
ae 173.0 66.0 91.4 89.2 98.0 
§02... 174.0 83.9 102.8 93-4 103.0 
59S. 178.0 96.5 106.5 106.0 103.0 
ee 173.0 88.6 99.3 99.1 101.8 
333. 169.5 82.1 96.4 106.5 99.2 
eee 172.0 75.9 87.0 87.5 92.0 
rr 178.5 gI.2 103.3 101.0 103.0 
9131 172.0 76.6 99.8 91.8 93.8 
ee 179.0 79.0 94.9 85.5 97.3 
9597.. 180.5 86.6 99.7 89.9 100.2 
727. 175.1 75.9 96.8 81.2 94.7 
759. 177.5 85.8 101.0 89.8 98.0 
761. 171.0 75.7 95.5 87.7 96.0 
778. 171.0 76.7 96.5 85.1 95.0 
eee 172.7 84.6 105.1 97.8 99.2 
ee 176.0 76.4 97.7 83.5 97.4 
eee 170.0 80.6 99-7 94.2 94.2 
See 175.7 89.9 110.5 112.2 113.7 
rr 179.5 79.1 97.4 87.3 g6.2 
ad so scnend 171.5 77.6 98.8 92.3 97.3 
Serr 4.86 8.05 4.78 6.90 6.53 


























132 GEORGE J. MOHR AND RALPH H. GUNDLACH 


Weight in kg; other measurements in cm 


TABLE VII 


AVERAGE MEASUREMENTS FOR THE ASTHENIC, ATHLETIC AND PYKNIC GROUPS 





Joliet Asthenic 
Kretschmer 
Athletic 
Joliet Athletic 
Van der Horst 
Normal Leptosome 





a 
£§ 
i 
Height 168.4 
er 
Weight........ 50.5 
aa 
Shoulders... ... 35.5 
S.D.. 
OS errs 84.1 
Serre 
Abdomen...... 74.1 
SS: an 
CN dK tea aced 84.7 
S.D.. : 
Forearm....... 23.5 
er 
Hand. 19.7 
S.D. 
ES 30.0 
§.D.. 
ee 89.4 
5.D.. 
0 ae 
Pere 
Pee. Se... .. s+ 55-3 
SRS ew 
Sag. diam...... 18.0 
Frntl. dm...... 15.6 
Vert. dm.......] 19.9 
Hair to nose.... 
ee 
Nose to mouth.| 7.8 
epee 
Mouth tochin..]| 4.5 
| es 
Nose height....] 5.8 
itis awes 


Nose width. ... 
ES 
Zygomat. (1)... 
dase oe 
Zygomat. (2)... 
Pao 
Angles of Jaw.. 
|) Sr 








Van der Horst 
Psychotic Leptosome 


Kretschmer 
Pyknic 
Joliet Pyknic 
Van der Horst 


Normal Pyknic 





— 
SN 
- 
Yu 














NI 
- 
aD 


~ 
o 
wv 


91.9 


© oo 
~ Oo 
NS Oo 


t 
ON 
wv 


21.1 


5.8 





173.1 


@ & 
A NN 


ea 
° 
WN 


78.8 
90.2 
25.4 
20.9 
33-2 
91.9 


57.2 
19.5 











20.5 


37-9 


58.9 
19.9 








Van der Horst 
Psychotic Pyknic 

















RELATION BETWEEN PHYSIQUE AND PERFORMANCE 133 


refused to complete the tests. Often the general attitude was one of resistance to 
every expenditure of energy not routine. The result is that, in many of the tests, 
the distributions are possibly more scattered, due to the partial cooperation and even 
wilful sabotage of a few of the subjects. Furthermore, it cannot be determined 
whether one of the groups is affected by prison life more than another. Our data 
can have therefore little more than a face value; Joliet prisoners did actually perform 
in this fashion. Any comparisons with “normal” or psychotic groups must consider 
these facts. 

The data for a comparison between the groups are derived 
from several sources. The prison record includes the Army 
Alpha which is given upon entrance to the institution. A 
case-record is also taken which includes previous arrests, 
marital state, occupation and similar facts. Although these 
records are taken from the report of the men themselves, 
except in the case of previous court-record, they are not 
verified. Nevertheless they are significant. The actual 
psychological testing included an individual test for each 
man which required from forty-five minutes to an hour. 
The subjects were tested from the various groups in a mixed 
order; that is, athletic, pyknic, asthenic, pyknic, etc., so 
that any modifications in procedure or variations in per- 
formance due to time of day, heat or similar extraneous 
condition would be balanced out. The Rorschach ink-blot 
test was given independently. A group-test involving substi- 
tution, cancellation and information completed the experi- 
ment. Although the group-tests were given to but thirty at a 
time, they were all given on the same day so that as little 
information as possible concerning the tests could be ex- 
changed between the men. 

1. Army Alpha: The tabulation of the Army Alpha shows 
some very striking differences. Stated in the terms used in 
the Army grouping our classes distribute as follows: 


TaB_e VIII 


DISTRIBUTION OF SCORES IN ALPHA BY LETTER 











A B C+ Cc Cc-— D EK |Mean| S.D. 
Asthenic......... 3 5 5 $ I I 96.5 | 42.6 
eee 7 5 5 . 79.2 | 50.7 
re 8 6 10 13 2 5 57.9 | 36.5 






































134 GEORGE J. MOHR AND RALPH H. GUNDLACH 


The differences between the athletic, the asthenic and the 
pyknic groups are extremely marked. The mean of the 
asthenic group lies above 85 per cent. of the pyknic subjects. 
The Army Alpha correlates with the index of build — .34. 
The Army Alpha, besides providing a total score which has 
some reliability as to the general capacity of the individual, 
offers as well eight distinct types of test. The average for 
the raw scores is as follows. 








TaBLe [X 

Raw ALPHA SCORES 
Total 
Test No. I 2 3 4 5 6 7 8 Zero 
Scores 
Asthenic.........] 7-7 | 9.0 | 8.5 | 14.4] 12.4 | 8.5 | 13.2 | 21.5 9 
ee 6.5 | 8.5 | 7.7 | 11.1 | 9.0] 8.1 8.3 | 18.7 18 
Is iat na d0in 5.6 | 7.3 | 6.1 79] 7.1} 5.0 | 6.4 | 13.7 40 
































In every test the asthenic averages highest, the athletic 
second and the pyknic lowest. The uniformity of these 
averages in ranking the groups in an order corresponding 
inversely to the size of the index is outstanding, and may 
indicate an appreciable relationship in the general population 
between the character of the physical build and performance 
in the Army Alpha.? 

The groups differ in age as well as in intelligence. The 
distribution of the ages for the three groups is as follows. 








TABLE X 
Age 15-19] 20-24] 25-29] 30-3 4|3 5-39] 40-44145-49]50-54] 55 |Mean| S.D. 
Asthenic......... I 6 7 I 2 I I O | O | 28.551 7.54 
Athletic.......... I 9 13 2 fo) I fe) © | 0 | 28.65} 6.70 
ae 10 8 9 5 I 6 4 I | 34.75] 1.60 






































Since these ages are taken from the men’s report it is probable that a number of 
them are not correct, especially in view of the bimodality of the pyknic distribution 





. mid , ; weight . 
206 Naccarati (Op. cit., p. 25.) reports an average correlation * Celeht with the 


Thorndike scale of .228 for 221 students, and on a group of 75 students a correla- 
tion of .35 between ponderal index and the Thorndike scale. 








RELATION BETWEEN PHYSIQUE AND PERFORMANCE 17 


we 
Ww 


and of the bunching in the 25-30 group among the athletic and asthenic men. This 


holds for all classes, however, and the differences between the groups should therefore 


retain their significance. In a great many of the performance-ditferences which 
follow, these differences in age and intelligence must be evaluated. Age is apparently 
not a factor in the lower performance of the pyknic subjects on the Army Alpha. 
The eleven oldest men of this group average .59 in this test, which is slightly above 
the average for the total group. 

2. Tapping Rate: Van der Horst” calls attention to the 
fact that in the depressive states there is a slowing both of 
the metabolic and the psychological processes. ‘The patient 
in a depression moves slowly, his physiological processes are 
retarded, and his thought processes are sluggish. As evidence 
of differences in ‘psychic tempo,’ van der Horst points to 
what he calls the normal rate of tapping among his leptosome 
and pyknic groups. He states that when an individual rests 
his forearm comfortably on a table and taps with the fore- 
finger at a rate convenient to him, a definite tempo is estab- 
lished and maintained even though the test is repeated on 
various days. His patients and normal leptosome subjects 
ranged between nineteen and twenty-seven taps per ten- 
second period, while the normal and psychotic pyknics ranged 
between nine and twelve. The pyknic group consistently 
tap at a rate much slower than the leptosomes, and there was 
no overlapping in rate between the groups. ‘There were three 
exceptions in the manic-depressive group. ‘These were said 
to be in a manic phase and their rates were even higher than 
those of the leptosome group. 

Due to the ambiguity and looseness of the words “con- 
venient to you”’ in the directions, we expected a considerable 
fluctuation in performance arising from different self and 
occasional instruction’? and as a consequence gave this 
tapping test first in the experiment and again at the conclusion 
to determine its variability. The directions were: ‘“‘ Make 
yourself perfectly comfortable. Rest your right forearm on 
this table. Now, will you tap with your index finger upon 
this key at any speed which is comfortable and perfectly 
convenient for you?” A number of the men asked if they 


21 Op. cit., p. 345. 
2Q. F. Weber & M. Bentley, The relation of ‘instruction’ to the psychosomatic 
functions, Psychol. Monog., 1926, 35, no. 163, I-15. 








136 GEORGE J. MOHR AND RALPH H. GUNDLACH 


were to tap as fast as they could. In these cases they were 
further instructed that it was any convenient rate at which 
they were to tap. In each tapping test, after the individual 
had started for a few seconds, the number of taps was counted 
for ten seconds. Three such ten-second periods were taken 
in each experiment. To determine the “normal” tapping 
rate, the variation and the consistency, the following averages 
were determined: The rate for the first ten-second period, 
the fastest rate minus the slowest rate in all of the six in- 
stances, and the increase or decrease in both of the sets of 
three ten-second periods. Results are to be found in Table 


XI. 








TABLE XI 
RESULTS FOR TAPPING TEST 
Asthenic | Athletic | Pyknic 

(a) Initial rate for 10-sec period......... Mean 33.8 32.7 31.7 
S.D. 13.5 14.2 11.9 
Median 28.0 29.0 30.0 
(6) Variation: Max-Min............... Mean 11.0 11.6 8.8 
S.D. 21.7 29.4 22.0 
(c) Consistency over 3 10-sec periods....| Mean 2.5 2.4 1.1 

S.D. 3-4 5-4 

















The results show that there is a very slight difference in 
the initial tapping rate between the asthenic and the pyknic 
groups, which has practically no significance in view of the 
deviation of the two groups. The rates for all three groups 
are much faster than that for any of van der Horst’s subjects 
except those said to be in a manic phase. Moreover, there 
are very great variations in the tapping rate for each indi- 
vidual, although the pyknic subjects vary less than the 
other two groups. There was a tendency to an increase in 
rate in all three groups, particularly the asthenic. From 
this we may conclude that no definite tapping rate is demon- 
strated. This is contrary to the definite results reported by 
van der Horst. None of our subjects tapped at the low rate 
reported by van der Horst to be typical for the pyknic 


group. 











RELATION BETWEEN PHYSIQUE AND PERFORMANCE 137 


It was considered that the rapid rate of tapping and the wide variability might 
be due in part to the peculiar conditions of the prison situation. Despite re-assur- 
ances, many of the men were tense and uneasy. Consequently this would destroy the 
conditions essential for demonstration of a “‘normal” rate. The tapping rate of 
fifteen members of the Institute Staff was determined. These rates were found to 
vary from seventeen to fifty with no grouping at any particular point. We have 
been unable to demonstrate any “normal” tapping rate. 


3. Speed of Writing: Speed of writing was used as a 
possibly more reliable measure of differences in speed of 
performance. Two of the tests of writing were: Writing 
““The United States of America” at the normal rate, and 
writing the same as rapidly as possible. 











TABLE XII 
Time (SEC) REQUIRED TO WRITE “Tue Unitep States or America” 
Asthenic | Athletic | Pyknic 
Oe Ns ccc cceskcauewnsceee Mean 14.0 15.0 15.2 
S.D. 2.8 3.1 4.7 
ET Pe ee re ey Mean 10.9 12.1 12.9 
S.D. 2.6 2.3 3-3 

















The normal rate shows little except in the ranking of the 
means for the groups; asthenic first, athletic second, and 
pyknic third. The rapid rate, however, shows significant 
differences which are lessened by the fact that one asthenic, 
three athletic and eight pyknic subjects were unable to spell 
the words unassisted. 

This writing test correlates with the index —.236; and the 
speed of writing correlates with the Alpha test .66. By the 
method of simple partial correlation, using the formula 


_ Rw — (Ris X Ru) 
Va = Rut)(1 — Ras’) 


the constant element Alpha was partialed out. As a result, 
the correlation of the index of build with the speed of writing, 
with the factor measured by Alpha held constant, is —.017. 
It thus appears that the significance of the difference between 
the means of the groups is due largely to the factor measured 
by Alpha. The tapping rate and the speed-of-writing test 








Ryw3 











138 GEORGE J. MOHR AND RALPH H. GUNDLACH 


show but slight differences in “‘psychic tempo”? among the 
three groups. 

4. Reaction Time: To explain the results of a reaction 
experiment, van der Horst distinguishes the inhibition of the 
melancholic patient from the blocking characteristic of the 
schizoid. Blocking is clinically described as an interference 
with thought processes which manifests itself by total in- 
ability of the patient torespond. . . . “Impeded (or blocked) 
patients can react just as quickly and strongly as normal 
persons once the obstruction has been broken down, whereas 
inhibited patients always evince the retarded character of 
their movements.” * In this determination van der Horst 
used a type of reaction experiment which will be hereinafter 
described. He obtained the following differences of reaction 
time with and without distraction. 


TABLE XIII 


REACTION TIME IN SECONDS OBTAINED BY VAN DER Horst” 





Leptosome Pyknic 





Distraction | No distr’n Distraction | No distr’n 








re re .58 31 62 .26 
are .72 45 .89 39 
NN 5 aw eeiee caines 14 14 .27 13 

















From the table it appears that, without distraction, the 
reaction time of the pyknic group is faster than that of the 
leptosome. With distraction, the time is slower. Van der 
Horst attributes this result to the factor of inhibition which 
he considers would especially delay the reaction time of the 
pyknic group in the more complex situation. 

When the healthy leptosomic individual is compared with 
the schizophrenic patient, it is found that, although the 
reaction of the former is faster, the actual delay in reaction 
with or without distraction is constant. Van der Horst 
attributes this delay to the constant effect of blocking which 

23. Bleuler, Textbook of Psychiatry. | 

4 Op. cit., p. 353- 








RELATION BETWEEN PHYSIQUE AND PERFORMANCE 139 


does not vary in the two situations. Comparing the healthy 
pyknic group with the manic-depressive patients, it is found 
that, with distraction, the actual prolongation of the reaction 
time among the latter is twice as large as that obtained with- 
out distraction. This, as van der Horst explains, is an effect 
of the factor of inhibition which is more effective under the 
distracting conditions. 


The differences between normal leptosomic and pyknic 
groups warranted further investigation. 


Our apparatus, adapted from that which van der Horst used, consisted of a black 
oblong board 114 cm x 79 cm, on which sixteen small lamps were scattered in irregular 
order. Each one was encased in a black metal cylinder to eliminate diffusion; in the 
top of each cylinder was a colored glass. ‘There were four each of white, blue, green 
and red lights. All the lights with the exception of the red were connected to the 
circumference of a small copper disc and insulated from each other with bakelite. 
Over this passed a metal arm which made contact for each of the lights separately 
as it passed over a corresponding contact plate. An assistant turned this arm at a 
speed regulated by a silent pendulum so that each light was displayed for the period 
of approximately one second. The four red lamps were separately arranged so that 
they could each be independently lighted by the experimenter’s reaction key. his 
key would cut the circuit for the other lights, turn on one of the red lights and release 
the Bergstrom chronoscope. ‘Two lights could never be on at the same time. ‘The 
subject’s reaction key would turn off the red light and stop the pointer on the chrono- 
scope. The time in sigma was read off directly and recorded. ‘There are several 
difficulties with the apparatus as it is devised. No mechanism controls the exact 
time of exposure for the various lights, nor the exact time at which the red light shall 
go on. 

The subject sat approximately two meters from the stimulus board. In the 
experiment with distraction the subject was instructed to name the color of each light 
as it went on, except in the case of the red lights. As soon as a red light appeared 
the subject was to “turn it off as rapidly as possible” by pressing the reaction key. 
In the experiment without distraction only one red light, which was indicated in 
advance, was used, while the other lights were out of circuit. Van der Horst obtained 
ten reactions according to each method. He did not state whether, with distraction, 
the red light was to be thrown in after the previous light had been exposed for one 
second, or whether he would sometimes cut the length of the previous exposure. 
Furthermore, in the case of the reaction without distraction, van der Horst did not 
say whether he used a constant or an irregular length of time between his 
signal and the stimulus. According to the work of Woodrow ™ the length of time 
between the ‘ready’ signal and the stimulus is very significant in determining the 
speed of reaction. 


. ready : 


Our experiment was divided into four sections of ten reac- 
tions each. Inthe first ten with distraction, the red light was 


25H. Woodrow, The measurement of attention, Psychol. M 


nog., 1914, no, 76 
I-158. 














140 GEORGE J. MOHR AND RALPH H. GUNDLACH 


turned on after a random number of other lights had each been 
exhibited for one second. In the second set, the light prior 
to the red light was interrupted at various periods of exposure 
of less than a second. In the third set, the constant optimal 
time of two seconds elapsed between the signal “‘ready”’ 
and the flash of the red light. And in the final set a varying 
length of time arranged at random between two and twenty 
seconds elapsed between the “ready” signal and the stimulus. 
The averages and deviations are to be found in Table XIV. 








TaBLeE XIV 
REACTION TIME IN SECONDS, WITH AND WITHOUT DISTRACTION 
Reaction Time Asthenic Athletic Pyknic 

A. With distraction....... Mean 574 599 .596 
Regular intervals....... S.D. .096 .075 .093 
B. With distraction....... Mean .590 614 623 
Irregular intervals...... S.D. IOI .096 .120 
C. Regular 2-sec interval... Mean 211 224 .237 
} S.D. 025 027 .026 
D. Irregular interval....... Mean 265 281 301 
S.D. 029 031 .040 
E. Average, C & D....... Mean 241 253 .268 
S.D. .024 .026 .029 

















The series with distraction show practically no differences 
between the groups, although in both cases the pyknic subjects 
are very slightly slower than the asthenic.% In the reaction 
times with the two-sec interval and the irregular intervals 
there are significant differences. In both cases the pyknic 
subjects are considerably slower than the athletic, who in 
turn are slower than the asthenic. Our results give no 
support to the distinction between blocking and inhibition, as 
suggested by van der Horst, but show the pyknic group slower 
than the asthenic group in all four cases. This might lend 
slight evidence to a difference in ‘psychic tempo.’ The re- 
action time with irregular intervals correlated with the index 


% The average difference between the asthenic and pyknic groups is almost 
identical with that of van der Horst’s healthy leptosomic and pyknic groups. The 
standard deviations for our groups, however, are so large as to make these.differences 
insignificant. Van der Horst does not give any deviations here, but merely states 
that they do not affect the results. 











RELATION BETWEEN PHYSIQUE AND PERFORMANCE 14! 


of build .482, with Alpha —.242. This reaction-time corre- 
lated with the index, the factor of Alpha held constant, was 
439.77 

5. Writing with Distraction: Another performance de- 
signed to show the effect of distraction was the writing of 
“The United States of America” while counting aloud by 
two’s. 


TABLE XV 


Time (SEC) REQUIRED FOR WRITING “THe Unitep States or America” 
WHILE ADDING BY TWO’S 








Asthenic Athletic Pyknic 
Pe 8 eee Mean 16.0 16.1 16.4 
Median 13 14 14 
(6) 1st-3d time........... Mean 2.3 4-7 3.2 
Median 1.0 3-4 3.0 




















This table likewise shows practically no differences between 
the three groups. Even this negative significance, however, 
is questionable, because a number of the men obviously had 
practiced although not one of them would admit it. They 
were known to ‘cell’ with other subjects and they compre- 
hended the directions for this test and for writing backwards 
with an ease not approximated by the other men. 

6. Franz Dot-Tapping Test: Since in schizophrenic dis- 
orders disturbances in attention are particularly characteristic, 
we used as one performance the Franz dot-tapping test. On 
quarter-inch quadrille paper an area 10 squares by 30 was 
ruled off. The subject was asked to begin on the left of the 
row of 10 nearest him, and to put one dot in each square; 
then to start on the right end of the second row, putting one 
dot in each of these squares, and so on, back and forth, for 


27 A curious relationship between Alpha and scores of the combined reaction time 
without distraction was noted for the three groups. The correlation for the pyknic 
group was — .285, for the athletic group — .32. This would indicate that the more 
rapid reaction time is correlated with a high score in Alpha. The correlation for the 
asthenic group, however, was + .63. Here high score on Alpha is correlated with a 
slow reaction time. The correlation with the total group was — .117. Two possible 
explanations for the discrepancy with the asthenic subjects are suggested. One is 


the possibly greater distractibility of the more intelligent asthenic subjects; the other, 
the small number of cases. 








142 GEORGE J]. MOHR AND RALPH H. GUNDLACH 


all 30 rows. The time for each row was read to an assistant 
from a continuously running stop-watch. This involves a 
small error which, however, would be sensibly constant for 
the various groups. 








TasBLeE XVI 
Franz Dot-Tappinc Test 
Asthenic | Athletic | Pyknic 
(a) Ave. time for each row of 10 squares..]| Mean 4.76 sec.| 4.37 sec. | 4.77 sec. 
S.D. ae"; a” jas * 
Median | 4.40 “| 4.10 “ aa 
(6) Mean variation.................... Mean .49 .40 .48 
S.D. .28 ae 17 
Median 41 .38 44 
(c) Coefficient of variability............ Mean | 10.24 9.11 10.25 
S.D. 3-74 1.61 2.34 
Median | 8.80 8.80 10.28 

















The average time for each row of ten squares for the three 
groups is practically the same, although the athletic is less 
variable and slightly more rapid. The mean variation is 
larger for the asthenic group than for the pyknic; while that 
for the athletic group is again smallest with the least deviation. 
The coefficient of variability ?* shows that the asthenic and 
pyknic groups average the same, while the athletic is less 
variable. Above the coefficient for variability 9.9, there are 
20.0 per cent. of the athletic, 36.9 per cent. of the asthenic, 
and 53.5 per cent. of the pyknic subjects. Neither the 
intelligence nor the age of the pyknic men is responsible for 
this greater variability.2? The most variable three cases, 
however, were asthenic subjects. The Franz dot-tapping 
test correlated with the index of build only —.013, due to the 
extreme variations of the asthenic and pyknic subjects. 

7. Young’s Light Series: Another test which involves 
attention is Young’s Light Series.*° The apparatus consists 


100 M.V. 
M 
2 The average coefficient of variability for the 17 pyknic subjects having an 
Alpha score over 60 is 10.4; the 20 subjects who are over 33 years old average 9.2. 
30 This test was devised by Dr. P. T. Young of the Unrtversity of Illinois. It’ 
will presently be published. 


28 Determined by the formula C.V. = 











RELATION BETWEEN PHYSIQUE AND PERFORMANCE 143 


of a gray board two feet square. There are ten holes in a 
circle which expose small lights. These can be turned on in 
any sequence desired. A series of lights ranging from three 
to seven was used. After each series has been completed, 
the subject is requested to point to the lights which have 
gone on, in the same order in which they appeared. Each 
individual’s performance was scored by counting one for each 
light in each series correctly performed. 


TaBLeE XVII 


RESULTS FROM YOUNG'S LIGHT SERIES 





Asthenic | Athletic | Pyknic 





ee Ss. ica des cenucea aus r Mean 79.8 87.8 71.9 
S.D. 27.5 21.2 22.9 
(b) Selected series score........... ....| Mean 62.2 67.5 $1.0 
S.D. 19.1 18.1 16.8 

















In this test the athletic men have the highest score, with 
the asthenic second and the pyknic lowest. The performance 
on each separate light-test was tabulated to determine what 
kinds of series were passed successfully by the various groups. 
It was found that in the ten series in which two or more 
adjacent lights appeared consecutively, the pyknic subjects 
did better than, or nearly as-well as, the other men; while 
the very irregular series were difficult for them. The classi- 
fication of errors and successes for the asthenic subjects 
reveals no differences in the variety of light-patterns missed. 
It appears, however, that the asthenic men were fatigued by 
the test; for on the last four series the 19 men all together 
were able to pass correctly but 14 series, while the 26 athletic 
men passed a total of 39 series. The eight different series in 
which the pyknic subjects were equal or superior to the 
athletic and asthenic subjects were discarded.*! Since these 
are of a uniform pattern, successes with them are not likely 
to be due merely to chance. This re-valuation of the test 
accentuates the differences between the groups. 


31 The average of these discarded series was, for the pyknic subjects, 22, and, 
for the athletic-asthenic, 20.5. 








144 GEORGE J. MOHR AND RALPH H. GUNDLACH 


In the dot-tapping and light series it is noteworthy 
that the asthenic group no longer holds the leading position in 
performance. ‘There appears to be a disturbance of attention 
in some of the asthenic men sufficient to bring the average 
performance of the group below that for the athletic subjects. 

The correlation of the light series with the index of build 
is —.264. This is low, due to the fact that the asthenic 
subjects, who are placed at the end of the index distribution, 
are intermediate in this performance. The light series corre- 
lated with Alpha .266. The series correlated with index, 
when Alpha was held constant, —.193. The light series corre- 
lates with the Franz dot-tapping ‘test —.029. 

8. Cancellation Test: As a further means of demonstrating 
possible differences in speed of perception, a cancellation 
test was included in the group performance. In this test a 
column of eight-lettered words was paired with a column of 
nonsense material, 7 letters in length. These letters included 
all but one of the letters in the word to the left. The subject 
is instructed to cross out in the word that letter which does 
not occur in the group of letters at the right. 


TasBLe XVIII 


CANCELLATION TEST 








Cancellation Asthenic Athletic Pyknic 
(a) Total correct.......... Mean 39.9 36.5 26.7 
S.D. 14.3 14.5 16.0 
(6) Total attempted....... Mean 42.3 38.8 29.5 
S.D. 12.5 14.0 14.6 

















In this test the pyknic subjects fall far behind the athletic 
and asthenic in the correct number and in the total number of 
items attempted. Age is not a factor in the low performance 
on pyknic men. The 12 men over 40 years of age average 
26.75; the 13 men between 30-39 years average 29.3. 

The cancellation test correlated with the index — .327, 


32 R. H. Gundlach, Effects of practice on the correlation of three mental tests, 


J. Educ. Psychol., 1926, 17, 391. 
33 E.g., ABANDONS—BDAONSA. ~ One of the N’s is missing; either may be 


crossed out. All the ietters are capitalized. 





RELATION BETWEEN PHYSIQUE AND PERFORMANCE 145 


and with Alpha .795. Correlation with the index, with the 
factors measured by Alpha partialed out, is —.10. The 
cancellation test correlates with reaction time (irregular 
intervals) — .287. 


9. Color Fusion: By means of a rough flicker method, 
van der Horst measured the speed at which fusion of red 
and green was reported by his subjects. He found that 
fusion occurred for the pyknic subjects at about 1145 revolu- 
tions per minute; for the leptosomic group at 1650. For 
his insane group the fusion occurs for the pyknic men at about 
780 revolutions, and for the leptosomes at about 1160. The 
variations were not reported, but they were said to be in- 
significant. Since these differences were so striking, we 
attempted to duplicate the experiment. 


A room darkened with tar-paper screens and illuminated by a 110-volt lamp was 
used. The subject sat facing the color wheel with the lamp behind him. 
disc was kept at a constant distance from the subject. The sectors were 205 degrees 
of green and 155 degrees of red (Hering saturated colors). The speed of rotation was 
taken by means of a Starrett counter which was fastened into the rotating spindle. 
When the subject reported fusion the time for three hundred revolutions was taken in 
seconds. The procedure of the experiment was as follows. After the subject was 
placed comfortably in his chair the rate of rotation was so speeded up that the red 
and the green were completely fused and was then slowed down. The experimenter 
pointed out during this exhibit the flickering and fusion which occurred both on in- 
crease and decrease in speed. This was repeated a second time. Then the disc was 
speeded up and the subject was instructed to report “Now” as soon as he could 
distinguish the red and the green. The rotator was then gradually slowed until the 
subject reported. At this time the speed was recorded and the motor shut off. Then 
the instructions were given; “‘When the red and green mix so that you can’t see cither 
of them any more say ‘now.’” ‘The speed of rotation was gradually increased until 
the subject reported fusion. Three readings were taken on the increase and three 
on the decrease. Although the directions were as simple as possible, there was con- 
siderable difficulty. A great many changes occur in the course of speeding up or 
slowing the disc, and it was very difficult to indicate what these untrained subjects 
should report. Nor was it possible to determine whether the observers were reporting 
the first indications of a change, or substantial fusion. 


test denotes. 
TABLE XIX 


SPEED OF ROTATION AT WHICH COLOR FUSION OCCURS 


The rotating 


We cannot state what the 











Asthenic | Athletic | Pyknic 
No. sec per 300 revolutions............ Mean 27.06 25.79 28.83 
S.D. 7.16 3.14 4-33 























146 GEORGE J. MOHR AND RALPH H. GUNDLACH 


In computing the results all six readings were tabulated, 
since the gross limen was desired. The differences as shown 
by the averages are given in seconds for three hundred 
revolutions. The athletic subjects were much less variable 
than either of the other two groups and fusion occurred only 
at a more rapid speed of rotation. It is to be noted that van 
der Horst’s group of leptosomic subjects correspond in physical 
index with our athletic group. The differences here demon- 
strated, however, are not as extreme as those he reported. 
Another factor to be considered is the age of the pyknic 
subjects. The athletic and asthenic groups were combined 
in the following table because of the small number of asthenic 
and athletic men in the more advanced ages. ‘Tabulated by 
ages the means for the groups appear in Table XX. 














TABLE XX 
RELATIONSHIP BETWEEN AGE AND SPEED OF ROTATION AT WHICH COLOR FUSION OCCURS 
Dt ckcadecuuenaienes 18-24 25-34 35-40 41-49 50-56 
Athletic and asthenic....| 25.25 sec | 26.1 sec 22.8 sec 30.8 sec 
Sere ye 30.5 “ a2 ™ a5 “ 32.7 sec 
kk ih deine aks 20-34 18-38 18-32 23-40 26-41 
Athletics and 
Cases} Asthenics...... 16 24 3 2 O 
eee II 17 4 7 5 




















‘lime (sec) for 300 revolutions. 


Although the distributions are all of wide range, there is a 
constant difference between the two groups. The pyknic 
subjects usually report fusion at a lower speed of rotation 
than do the members of the other two groups of comparable 
age. The clearly marked differences and the consistency of 
reports of van der Horst’s subjects contrast markedly with 
the difficulties our subjects had in understanding precisely 
what it was they were to report upon. 

10. Substitution Test: Since disturbance of the associative 
mechanism is an outstanding characteristic of schizophrenic 
disorders, tests were sought that might measure differences in 
these processes. Van der Horst reports only slightly signifi- 


cant results with the Kent-Rosanoff word-association test. 








RELATION BETWEEN PHYSIQUE AND PERFORMANCE 147 


In the preliminary study among Joliet prisoners, the Kent- 
Rosanoff test showed no marked differences. Perhaps a 
better test of association would be the learning and recall of 
nonsense materials since this would avoid associations pre- 
viously made. Cooperation sufficient to carry out this experi- 
ment among the prisoner subjects could not be gained. 

The group-test was primarily designed to include the 
substitution test which is a modified form of learning. This 
contains a random array of ten different capital letters in 
fifteen rows of fifteen each. At the top of the page a key is 
given with the numbers which are to be substituted for the 
letters placed under them. Neither numbers nor letters are 
in a consecutive order. The test consists in putting the 
proper numbers under the corresponding letters. ‘This test 
was used at the beginning and at the end of the group-test. 
On both occasions the subjects were stopped at the end of 
every minute and were instructed to make a mark to indicate 
how far they had gone. This made it possible to determine 
the learning curve for each of the three groups. Five minutes 
was allowed on the first and three minutes on the last test. 


TABLE XXI 


LEARNING TESTS 



































Number of Substitutions in As- No. | Ath- | No. | Pyk- | No. 
thenic | cases | letic | cases | nic | cases 
A. First 5 minutes............Mean]| 143.4 19 | 144.9 22 | 130 38 
S.D.| 47.4 39 47.8 
m Sat Ss Oe... 256-55 Mean | 114.5 19 | 109.6 24 | 110.0] 38 
S.D. 36.7 38.5 38.4 
Cc Total 3 minutes (B).......Mean]| 82.2 19 80.45 24 84.35} 40 
" Total 5 minutes (A) S.D. 10.7 11.75 14.15 
D. Initial minute............. Mean] 25.3 19 23.3 22 22. 34 
Median 25 23 21 
Inter-quartile range | 18-31 8-27 16-26 
Ms UII, sc cestcees: Mean| 34.8 19 33.1 22 31.5 39 
Median] 33 31 31 
Inter-quartile range | 26-41 28-42 25-37 
F. 1st minute first trial.......Mean]| 40.0 19 34.8 23 33-4 | 40 
Median| 35 35 30 
Inter-quartile range | 26-49 29-39 24-39 
G. 3d minute final trial.......Mean]| 41.5 19 39.0 36.71 40 
Median| 42 38 34 
Iater-quartile range | 31-49 31-45 26-40) 














148 GEORGE J. MOHR AND RALPH H. GUNDLACH 


The means for the first and last minute in both the initial 
and the final tests consistently rank the groups in the order 
asthenic first, athletic second, and pyknic third. Comparing 
the totals for the two periods, it appears that the pyknic 
subjects have relearned more rapidly after the interval than 
have the asthenic or athletic subjects. To show the relative 
improvement on the part of the pyknic subjects, the relation- 
ship between the performance of the last 3-min period to 
the first 5-min period for each subject was tabulated (Table 
XXI, C). This difference may be due in part to the fact 
that some of the less intelligent pyknic subjects were unable 
to understand the directions, and consequently do relatively 
much more poorly during the first few minutes of the test. 
This is shown by the number of cases included in computing 
the averages for the pyknic subjects in the above table. 
Four of the pyknic subjects never got the directions straight, 
and indiscriminately placed numbers under the letters, and 
as a consequence had zero scores. These were not included 
in the computations. Of the seven highest men in the last 
3-min test, five were pyknic subjects. In view of the lower 
intelligence of the pyknic subjects this fact assumes greater 
significance. 


TABLE XXII 


CoRRELATIONS WITH SUBSTITUTION TEST 





Index | Alpha | Index Alpha 














of Build held Constant 
ECCT E PET TT ETT TTT TTT —.154 594 .06 
SIO kincncecnenceneeseen —.177 .673 .07 
i) EE cca cncnekbes see edoeens +.135 | —.079 —.11 





11. Writing Backward: Another test which involves the 
breaking up of old habits, as well as the establishment of 
new, is writing backward. The subjects were asked to 
write backward “‘The United States of America”’ three times 
as rapidly as possible. Mirror drawing might have been 
more fruitful, but we were unable to utilize this test. The 
reversed writing, however, shows some differences which are 
of the same order as those of the substitution test. 








RELATION BETWEEN PHYSIQUE AND PERFORMANCE 149 


TaBLeE XXIII 


Time (SEC) REQUIRED TO WRITE BACKWARD “THe Unrtrep States or America” 


= EE 2. 
































As- | No. | Ath- | No. | Pyk- | No 
thenic Cases let cases nic cascs 
A. Third Trial....... ...|] Mean] 53.6 18 43.5 25 49.7 43 
S.D. 20 12.6 21.7 
B. Diff’ce in time bet. 1st and 
| ee teases] Mean] 22.3 18 26.9 24 ; 38 
S.D. 16.3 21.5 25.4 








Here the asthenic subjects do poorest on the third trial, 
even though they excel on the first attempt. They seem less 
adaptable than the other groups despite their higher scores on 
the intelligence test. This test is subject to the same errors 
as the other writing tests, namely, those of previous practice 
on the part of some of the subjects and of semi-illiteracy on 
the part of others. The results seem to stand despite these 
sources of error. It appears according to this test that the 
asthenic subjects are slowest in the establishment of this new 
performance. The pyknic subjects do relatively better than 
the other two groups both in the last three minutes of the 
substitution test and in the third trial of writing backward. 
In other words, their performance improves relatively the 
most rapidly, despite lower intelligence. This may indicate 
relatively less efficiency of associative processes on the part 
of the asthenic group, a concept in keeping with the pre- 
sumable relationship of this group with the schizophrenic 
disorders. 

12. Information Test: An attempt was made to devise an 
information test which would reveal differences in interests as 
between the groups, such as might conform to those described 
as characteristic of schizothymic and cyclothymic tempera- 
ments. The schizoid is said to be egocentric; his interests 
are those that have a very personal bearing, and are some- 
what divorced from the every-day social and practical con- 
siderations of his environment. The cyclothyme, on the 
other hand, is said to be in very close touch with the world 
about him. His interest is in the events, objects and 
occurrences of his immediate experience. 

11 














150 GEORGE J. MOHR AND RALPH H. GUNDLACH 


The questions were devised to test on the one hand the 
information or interest in sports, automobiles, machinery and 
the like, and on the other hand information as to books, 
authors, music and other items presumably of greater interest 
to the less ‘social’ individual. The questions were of the 
multiple choice type.** Although questions of the first type 
were easy to devise, great difficulty was encountered in 
finding items of the second variety within the scope of the less 
intelligent prison inmates. As a result, there were 35 ques- 
tions of the first variety and 15 of the second. 

The groups did not divide along the lines anticipated. 
There were but seven questions in which the pyknic men gave 
a larger number of correct responses. Five of these were 
of Type I, and two of Type II.* 





4 A few examples are: 

A famous opera singer is—PADEREWSKI, DANTE, GALLI-CURCI, 
HEIFITZ. 

On a football team there are—5, 11, 9, 16 MEN. 

“One-Eleven” is the name of a—CARD GAME, SMOKING TOBACCO, 
CIGARETTE, SHAVING CREAM. 

Walt Whitman was a—POET, NOVELIST, LEADER OF A JAZZ BAND, 
MISSIONARY. 

Big Ben is a—PRIZE FIGHTER, RACE HORSE, ALARM CLOCK, 
TRUCK. , 


3 These items were: 

A famous tennis player is—VINCENT RICHARDS, FRANCIS OUIMET, 
JOEY RAY, RED GRANGE. 

“The Rhyme of the Ancient Mariner” was written by—S/]R WALTER 
SCOTT, ROBERT SERVICE, TENNYSON, COLERIDGE. 

The score for a touchdown in football is—3, 6, 7, 9. 

“99.44% Pure’? advertised —MEADOWBROOK BUTTER, MENNEN’S 
TALC, IVORY SOAP, FLEISCHMAN’S YEAST. 

“Wear-Ever” is a brand of—SHOES, CARPETS, TOOLS, ALUMINUM. 

Scales are manufactured at—NEW YORK, TOLEDO, BOSTON, FAIR- 
BANKS. 

Christmas comes this year on—TUESDAY, THURSDAY, SATURDAY, 
SUNDAY. 

On this test six minutes were allowed. This was a very long time for the test 
and a large number of the asthenic and athletic men were able to finish. The time 
was made long to compensate to some extent for the slowness of the pyknic group. 
The difference of the means for the test would be significantly larger had the time 
been cut down to, say, four minutes. 








RELATION BETWEEN PHYSIQUE AND PERFORMANCE Is! 


TABLE XXIV 


RESULTS OF INFORMATION TEST 





1. Information Test As- Ath- | Pyk- 
thenic letic nic 





A. Number correct.................. .........|Mean] 29.2 25.4 18.2 
S.D. 11.6 10.6 11.3 


a . Mean]! It. 13.1 12.8 


5 
S.D. 6.5 6.7 6.2 
C. Correct, 43 Questions. ........2.0e0e0: .eeeee-f Mean] 237.1 23.16 | 16.13 
S.D. 9.1 9.26 | 10.2 

















It is noteworthy that all the groups missed a large number 
of questions, the athletic and the pyknic subjects giving about 
the same number of incorrect responses. 

The information test has the second highest correlation 
with the index of build of any of the tests thus far considered. 


The correlation is — .354. The information test correlates 
with Alpha + .857. With Alpha partialed out, the informa- 
tion test correlates with the index — .132. The index corre- 


lated with Alpha, with the factor measured by the information 
test held constant, only — .028. It appears that the informa- 
tion test measures that factor which is responsible for the 
correlation of Alpha with the index. 

13. Rorschach Test: Each subject was given the Ror- 
schach test.** The responses were classified in accordance 
with the criteria as given by Rorschach. Our utilization of 
this test must be considered as very rough, but certain 
differences in character of response were readily discernible. 

The pyknic and asthenic subjects gave fewer responses 
than did the athletic. In view of the high intelligence of the 
asthenic subjects, the relative paucity of their responses is 
noteworthy, and may be attributed to repressive factors. 
Practically no sex responses were given by the athletic and 
asthenic men; .four of the pyknic subjects gave sex responses. 
The pyknic subjects were unable to give responses to an 
appreciably greater number of cards. In a classification of 
the individual records according to the Rorschach diag- 
nostic tables,®’ no differences were noted in the normal or 


* H. Rorschach, Pyschodiagnostik, Leipzig, 1921. 
37 Ibid., p. 40-41. 











































152 GEORGE J. MOHR AND RALPH H. GUNDLACH 


schizophrenic groupings. Very few of the athletic and 
asthenic individuals ciassify under the melancholic and manic 
disorders, while approximately a fourth of the pyknic subjects 
fall in this group. The large material obtained calls for 
individual and detailed consideration of each case and full 
report of this material must await a later time. 

14. Social Data as Related to Physical Build: A distri- 
bution of certain of the data given in the prison records is 
made in Table XXV. This table shows that a marked 
preponderance of the crimes committed by the athletic and 
asthenic groups are crimes against property. While this is 
also true of the pyknic group, the proportion of crimes 
against the person (assault and sex offenses) is much greater. 
These latter crimes are generally considered characteristic of 
a low grade of intelligence. 

A relatively large proportion of the pyknic group have had 
no previous prison records. This may be somewhat de- 
pendent on the character of the crime shown to be usual to 
this group. It may also mean that the members of this 
group get into difficulty less frequently. This assumption is 
further supported by the evidence of greater stability among 
the pyknic group. More of them are married, and more of 
them belong to fraternal organizations. The factor of age 
must be considered in relation to these two latter items. 
Many of the pyknic subjects gave an occupation classifiable 
as outdoor manual labor. The significance of these social 
facts cannot be valued with so small a number of cases, and 
a study now in progress may offer more information. 

Special effort was made to determine the nativity of the 
antecedents of the subjects. When parents were reported 
as native-born, the nationality of the grandparents was 
ascertained. The tabulation of nativity of antecedents in 
Table X XV includes reports given for 14 grandparents among 
the asthenic, 31 among the athletic, and 48 among the pyknic 
subjects. The asthenic men are predominantly of Irish 
extraction. The athletic group springs largely from British 
Isles stock. The pyknic group is scattered in origin; although 
there is a distinctly greater incidence of German and Slavic 








RELATION BETWEEN PHYSIQUE AND PERFORMANCE 153 


ancestry. The differences indicated are appreciable but are 
not sufficiently distinct to be confirmed in our small group. 


TABLE XXV 


SOCIOLOGICAL DATA FOR THE PHYSICAL GROUPS 





7 20 44 
Asthenic | Athletic | Pyknic 








No.| % |No.] % |No.] % 
I. Crime: 

Larceny, robbery, burglary............. ..}17 189.5} 22 | 84.2] 2 52.0 

Forgery, con game, embezz., extortion....... 2 | 10.5) 1 3.8110 | 22.7 

‘Crime against nature,’ rape, incest...... 3 | 1S] g | 20.5 

Accessory to escape................. beeen 2 | 4.5 
II. No previous imprisonment................... I S.2,11 | 42.3)19 | 43.2 
ge re 7 137-01 9 | 34.6130 | 68.0 
IV. National origins: 

EEE 17 13 18 

mere, SOOCCM, WEIR. . 2... 02sec cee eess 4 12 12 

EE Aiikdcekee wees ae aeea as 2 3 6 

I ists ick cask le sea cete A Server F 4 22 

Bohemia, Aus.-Hungary, Poland, Lithuania, 

i ak a hs oka I 2 12 
STL CCT OT ERE 3 1 9 
re ee eagle 7 7 9 

a 3 115.81 7 | 27.0126 | 59.1 

VI. Not member of social organization......... 10 |52.8] 8 | 30.9] 10 | 22.7 
VII. Average age of married men....... 34.0 28.9 36.6 
Average age of men not in social organization. .| 26.6 24.3 30.5 
Average age for total................ 28.5 28.6 34.7 























SUMMARY AND CONCLUSIONS 


From a large prison population 89 native white men 
were selected. Of these men, 19 were of the physical build 
conforming to that of Kretschmer’s asthenic type, 26 of the 
athletic and 44 of the pyknic type. These men conformed to 
the distinguishing characters and physical measurements for 
the respective types in accordance with Kretschmer’s consti- 
tutional scheme. By the use of a crude index, the selection 
of men was such as to minimize overlapping of the measure- 
ments of the individuals of the various groups. ‘These 
selected men were subjected to a large number of tests the 
results of which have been summarized in Table XXVI. The 
differences between the various means are expressed in terms 
of reliability of the difference; that is to say, with respect 











154 GEORGE J. MOHR AND RALPH H. GUNDLACH 


to the number of cases and the spread of the distributions.*8 
The significant differences have been utilized in the construc- 
tion of Chart IV, where the tests have been grouped under head- 
ings representative of the type of function they may in part 
measure. The athletic group is taken as the base or zero 
abscissa from which the other two groups deviate. The 
performance of the athletic group is used as the basis for 
comparison, because these subjects are more representative 
of the ‘normal’ physique and because the variations of the 
asthenic and pyknic groups take on greater significance in 
reference to the athletic group than in reference to each 


other. 


TABLE XXVI 


DIFFERENCES BETWEEN EACH PAIR OF GROUPS ON THE TESTS EXPRESSED IN TERMS 
oF STANDARD DEVIATIONS OF THE DIFFERENCE 

















The diff’ce between the means 

in terms of SDg is equal to 

Test 

Asth.- Ath.- Asth.- 

Ath. Pyk. Pyk. 

es ee hc egibiebnkunstesbeowine ceed 1.47 1.90 3.17 
I in nn 06504068440 h4004s040nws .20 .30 .58 
cc cccandweeneceseesenens .10 72 74 
Ae EE IE ben viend cg saccessaeas 1.94 1.24 + 3.20 
5. Writing U. S. of A. backward, 3d trial..........] 1.94 1.50 69 
6. Reaction time, dis. regular.................... .96 — .15 85 
ee Eo noo cccdesscveensaencwa 83 44 1.13 
ee i, I OE I, cca cawesescdosecssens 1.69 2.00 3.78 
es enh ec ndesedeuseesse’ 1.78 2.35 4.05 
ee 1.58 2.21 3.88 
ee — 1.27 2.07 18 
i icc cecesdiscesessenednan — 1.09 2.94 1.10 
13. Young light series, selected................... — .94 3-79 2.21 
Bh, Gs US BE ong occ cccccneweccessees .70 2.62 2.83 
ae tneee ksh tee snteewan as .88 2.63 3.55 
is i noes o4eeed en senseeeededs — .73 3-42 1.02 
i ith eh i ekes eo ee eee naanes 50 —1.16 — .64 
ee 1.13 2.68 3.49 
EE ree 1.59 3.37 4.25 








Note.—The scores are plus or minus, depending upon which is the ‘better’ mean. 
A fast reaction-time, low coef. of var, rapid rate of rotation for color-fusion, as well 
as a large score on Alpha, are considered high. 

% The differences are given in terms of the standard deviation of the difference; 

diff. 
No mt? + om? 





S.D.ain. = 











6.5 


44 
l6l 
742 


4295 


d51¢ 


























A DIFFERENCE INTELLIGENCE SPEED OF REACTION AITENTION LEABNING 
AS LARGE AS if oe ~~ " oa UD = —_ 
THIS, AND IN ¢ g : Wie, a g ki a: 622 6 
THIS DIRECTION, "2 4 - 08 Ps E 
MAY OCCURIN A & On 20 BAB BRo 
UNCORRELATED 4 af: Es ok g 25 Rae : 
44 TIMES 200 
75} 
150) Nw 
25> N\A 
6.3 TIMES 100 2 
751 
J5OolL 
25} ATHLETIC 
2 TIMES O 
X ceen m 
5OL 
75} 
63 TIMES 100 — 
\25} 
150} fo 
75} 
44 TIMES 
225} 
161 TIMES 2501 ST 
275} 
142 TIMES 300 
oa MA 
4299 TIMES 350} 
3751 
51514 TIMES 























Cuart IV. The difference between mean performances for asthenic-athletic and 
pyknic-athletic, in terms of $.D.qig., on the significant tests. 


Our results seem to justify the following conclusions. 

(1) The incidence of the physical types among the prison 
group is not markedly different from that observed by a 
number of investigators among the schizophrenic patients, 
and by Gruhle among normal individuals. It is similar to 
that observed by von Rohden among convicts, although the 
incidence of pyknic forms among the Joliet group is somewhat 
larger. 

(2) Although the physical types as Kretschmer describes 
them can be found in a prison population they can not be 
precisely differentiated by mere inspection. It is found that 
a number of men classed as asthenic have relatively larger 
chest, hip and abdominal measurements than do many men 








156 GEORGE J. MOHR AND RALPH H. GUNDLACH 


classed as athletic (Table IV). The subjectively classified 
athletic men run considerably into the range of the pyknic 
men on the same criteria (Tables IV and VI). The distri- 
butions both for the estimate of type (Table III) and for 
the physical measurements (Tables VI and VII) indicate that 
we are dealing with a normal distribution constituting a 
continuous progression from the characteristics that define 
the extreme asthenic to those that determine the ‘best’ 
pyknic habitus. 

(3) Grouping of the asthenic and athletic men under a 
single category of leptosomes does not seem to be justified 
on the basis of any of our physical or testing results. 

(4) Whatever the exact significance of the physical types 
may be, there is no doubt that differences in performance of 
the groups so selected can be demonstrated (Chart IV). 

(5) Among the most striking differences obtained between 
the groups were those on Alpha (Tables VIII and IX) and 
the Information test (Table XXIV and Chart IX). Such 
differences have not been previously emphasized. The differ- 
ences we have found may be due to the selection of the 
subjects; 1.¢., it is possible that only the relatively lower- 
grade pyknic men are found in prisons. 

(6) The results of the color-fusion test (Tables XIX and 
XX) cannot be interpreted. 

(7) The remaining determined differences indicate (a) that 
the asthenic subjects show relatively more schizothymic 
tendencies and (b) the pyknic subjects more cyclothymic 
tendencies. 


(a) Those tests which are included under the headings ‘attention’ and ‘learning’ 
in Chart 1V show the asthenic subjects lower than the athletic subjects in three of 
the four tests, and lower than the pyknic subjects in the fourth. Disorders of attention 
and association are said to be characteristic of schizophrenia. As has been pointed 
out, however, the learning tests (Tables XXI, XXIII) are not very reliable; and it 
may be questioned what the ‘attention’ tests (Tables XVI and XVII) measure. The 
sociological facts (Table XXV) reveal a more ‘unsocial’ character on the part of the 
asthenic men. This is in keeping with the schizothymic temperament. 

(6) Those tests which might have demonstrated differences based upon ‘inhibition’ 
in the pyknic subjects, and ‘blocking’ in the leptosomic group (reaction-time with 
distraction, Table XIV; writing-while-counting, Table XV) did not distinguish the 
groups, nor did the information-test (Table XXIV) distinguish the groups on the 
basis of their interests. ‘The characteristics of the cyclothymic temperament, however, 








RELATION BETWEEN PHYSIQUE AND PERFORMANCE 157 


have been measured in one respect; that of speed of performance. In straight reaction- 
time (Table XIV), speed of writing (Table XII), and cancellation (Table XVIII), 
the pyknic subjects are decidedly lower than the other groups; while the asthenic 
group is quite superior. The Rorschach Test tends to characterize the pyknic men 
as of melancholic or manic dispositions, while the sociological material (Table XXV) 
indicates a more social life. 


(8) Our results support in a general way the Kretschmer 
theory of physical and temperamental kinds, in that a 
relationship between physique and character of performance 
is demonstrated. They tend to modify the theory, however, 
by breaking down even Kretschmer’s loose conception of 
‘types’ and insisting on the concept of a general progression 
both of performance and of physical characteristics. 

There is abundant evidence in the literature that manic- 
depressive insanity is associated with the pyknic build. 
Certain investigators (¢.g., Mauz) have endeavored to show 
that schizoid ‘coloring’ enters into the manic-depressive case 
when there are asthenic physical components, and that 
cycloid elements enter into the schizophrenic case when there 
are pyknic physical attributes. The deviations in personality 
that constitute the symptoms of the insanities and that 
determine the ‘temperaments’ may be conceived as ranging 
from extreme schizoid characteristics on the one hand to 
extreme manic-depressive characteristics on the other, in 
a manner similar to the distributions we have found both 
for performance and for physical measurements. 

With a correlation of performance and physical charac- 
teristics it is quite reasonable that a more easily recognized 
constellation of physical and psychological attributes should 
impress the clinical investigator as a ‘type.’ The exceptions 
are so numerous, however, as seriously to impair the validity 
of this ‘type’ as anentity. Although there is no incompati- 
bility between our results relative to physique and _ per- 
formance and those of Kretschmer relative to physique and 
temperament, an interpretation of the facts at hand does not 
require the retention of the concept of ‘type.’ 











AN ANALYSIS OF EYE MOVEMENTS IN THE 
READING OF CHINESE 


BY EUGENE SHEN 


Psychological Laboratories, Stanford University 


INTRODUCTION: CONDITIONS OF THE EXPERIMENT! 


In a previous article ? the photographic equipment used in 
the Psychological Laboratories of Stanford University in con- 
nection with this research was fully described and preliminary 
results were presented on the reading of Chinese in vertical 
and in horizontal alignment. More extensive observations 
have subsequently been completed, with the reading material 
so printed as to make the vertical and horizontal styles of 
alignment strictly comparable.*? It is the purpose of the 
present paper to make a statistical summary of the results 
and to discuss certain points that seem to the writer to be of 
significance. 


The reading material used requires a brief description. From each of six different 
sources, widely varying in style and content, two comparable passages of about 500 
words were selected. Each of these twelve selections was printed both in vertical 
and in horizontal alignment. The type used was No. 4, approximately ;%-inch square 
to the word. The line was 6}-inches in length, containing 32 spaces for (24 to 29) 
words and (8 to 3) punctuation marks. Instead of the more common practice of 
placing the punctuation marks to the right of the words irrespective of alignment, 
which would lengthen the horizontal line and broaden the vertical, we decided to 
put all marks within the line of words, regularly giving each mark the space of one 
word. Every passage had exactly 18 lines, ending in a complete sentence and making 
the whole printed text practically a perfect square. 





1 The writer is greatly indebted to Professor Walter R. Miles for general super- 
vision and constant encouragement throughout the entire study as well as for much 
time and labor spent during the preliminary stages. Grateful acknowledgment is 
also due to Professor Truman L. Kelley for suggestions concerning the statistical 
treatment of the data. 

2W.R. Miles & E. Shen, Photographic recording of eye movements in the reading 
of Chinese in vertical and horizontal axes; methods and preliminary results, this 
JouRNAL, 1925, 8, 344-362. 

3 Made possible by the Thomas Welton Stanford Fund for Psychological Research. 
Credit must be given to the Young China Press of San Francisco for undertaking the 
difficult task of printing. 


158 








EYE MOVEMENTS IN READING 1S9 


The subjects were thirteen Chinese students at Stanford University, al! having 
served in the preliminary experiments. They were divided into two groups of six 
and seven persons. The division was made with a view to avoiding unequal distri- 
butions of reading ability as far as could be expected from the preliminary results. 
Observations upon each subject were completed in two sessions, each lasting about 
an hour and separated by a variable interval of a week or longer. Designating the 
source of reading material by the letters A-F and the two passages from the same 


source by the figures 1 and 2, the full program will be clear from the following tabu- 
lation. 











Selections Read by Selections Read by 
Group I, in Order Group II, in Order 
First Session......... (1) At, vertical (1) At, horizontal 
(2) Br, horizontal (2) Bu, vertical 
(3) Cr, vertical (3) C1, horizontal 
(4) D1, horizontal (4) D1, vertical 
(5) Et, vertical (5) Es, horizontal 
(6) Fu1, horizontal (6) Fi, vertical 
Second Session....... (1) Az, horizontal (1) A2, vertical 
(2) Bz, vertical (2) Bz, horizontal 
(3) C2, horizontal (3) C2, vertical 
(4) Da, vertical (4) D2, horizontal 
(5) Ez, horizontal (s) E2, vertical 
(6) F2, vertical (6) F2, horizontal 











Thus the same subject always read the two selections from the same source in 
the two different styles of alignment; and so was the same selection always read by 
the two groups of subjects. This system of alternation gives a safeguard to our 
requirement that the vertical and horizontal readings be made under the same con- 
ditions. The ability of the subject, the difficulty of the material, and the order of 
reading could not here introduce any spurious difference between vertical and hori- 
zontal reading, as each of these factors was so controlled as to affect the two types of 
reading precisely in the same manner. 

The subjects were instructed to read both understandingly and rapidly, and 
were told to be prepared for questions after reading each selection. This uniform 
instruction of course could not control the individual differences in general tempera- 
ment or specific habit; in fact it was obvious that the various subjects were differently 
motivated toward speed and comprehension. 

Since there were thirteen subjects and each read twelve selections, the total 
number of photographic records taken amounted to 156. A number of the records 
were, however, not completely legible. In general, those showing at least cight 
complete lines were used in the study of the number of pauses per line, and those 
showing at least a hundred distinct pauses were used in the study of the duration per 
pause. In each case 111 records entered into the following summary of results; but 
they were not in every case identical. 


THe NuMBER OF PAUSES PER LINE 


We shall first consider the number of fixation pauses 
required for the reading of a line. To characterize a distri- 





160 EUGENE SHEN 


bution of pause-frequencies per line, the statistical measures 
of central tendency, variability, skewness and _ kurtosis 
naturally suggest themselves. The particular measures that 
have been selected and calculated are defined as follows: 
For central tendency 
=X 
M= * 


For variability 


For skewness 


For kurtosis 


-5-LF]+ ET 


In the above equations X denotes the number of pauses 
per line, and x denotes the deviation from the mean. The 
mean as a measure of central tendency and the standard 
deviation as a measure of variability are familiar and need 
no comment. The 8 constants, however, warrant a brief 
explanation. The measure #; is obtained by dividing the 
square of the third moment (u3) by the cube of the second 
moment (u2). A value of zero for 6; means that the distri- 
bution is symmetrical, and the greater the 8; the more skew 
the distribution. It will be noted that 8; is an index of the 
degree of skewness, irrespective of its direction: that is to 
say, due to the squaring of uw; in the derivation, 6; is never 
negative, and it will remain the same if the curve of distri- 
bution is rotated 180° around the mean as an axis. This is 
obviously a drawback, as the direction of skewness is a 
significant item of information. We shall therefore in pre- 
senting the figures place a minus sign in front of every p, 
which has a negative yp; in its derivation. The measure £; 
is the fourth moment (4) divided by the square of the 
second moment (y:). ‘The normal distribution is mesokurtic 








EYE MOVEMENTS IN READING 161 


(8B, = 3). Leptokurtic distributions (8; > 3) have measures 
densely grouped at the average, with a relatively high peak 
and long tails. Distributions that are platykurtic (8: < 3), 
on the other hand, are rather flat in the middle and contracted 
at the ends, thus tending toward a rectangular shape. Distri- 
butions with a small population are often platykurtic, and 
the measure of their kurtosis has a large probable error. 

These four statistical constants we shall not here present 
for each of the 111 distributions of pause-frequencies per line. 
Instead, Table I will summarize the data by giving in each 
of the four cases the mean and the standard deviation of the 
entire group of these constants considered as a distribution, 
and the range of the averages both by subject and by material. 
Records for vertical and for horizontal reading have through- 
out been kept separate, the number of records entering into 
the results being 54 and 57, respectively. 


Part A of Table I presents the average number of pauses required in the reading 
of a line, probably the most significant item of information indicating the ability of 
the reader and the difficulty of the material. It shows that in general the eye stopped 
15 times in reading a vertical column and 18 times in a horizontal line, but variations 
within the group were considerable, as shown by the other three columns of the table. 
A comparison of the range of averages by subject with those by material will confirm 
our preliminary findings that the individual subjects were much more variable than 
the selections of reading material, although the twelve passages seem to the writer to 
cover a rather wide range of style and content. 


TABLE | 
SUMMARY OF STATISTICAL CONSTANTS OF III DISTRIBUTIONS OF 
PAUSE-FREQUENCY PER LINE 





A. Average pause-frequency per line (VM) 








Mean of S.D. of Range of Range of 

Whole Whole Averages by Averages by 

Group Group Subjects Material 
Vertical...... 15.0 3.32 11.3-25.3 12.9-17.6 
Horizontal.... 18.3 5.86 12.5-38.5 13.7-23.8 

















B. Variability of pause-frequency per line (¢) 








Mean of S.D. of Range of Range of 
Whole Whole Averages by Averages by 
Group Group Subjects Material 
Vertical...... 2.73 1.25 1.94- 8.18 1.88~4.12 
Horizontal.... 3.79 3.32 2.15-15.99 2.25-7.27 




















162 EUGENE SHEN 


TABLE I[—Continued 


C. Skewness of pause-frequency distributions (+ A) 



































Mean of S.D. of Range of Range of 
Whole Whole Averages by Averages by 
Group Group Subjects Material 
Vertical...... +.51 +1.00 — .30-+1.34 — .02-+1.53 
Horizontal.... +.29 + .50 +.01-+ .70 —.12-+ 1.06 
D. Kurtosis of pause-frequency distributions (82) 
Mean of S.D. of Range of Range of 
Whole Whole Averages by Averages by 
Group Group Subjects Material 
Vertical...... 2.7 1.3 I.5-4.0 1.7-3.8 
Horizontal.... 2.4 9 1.8-2.9 2.0-3.3 

















The upper limit of the range of the averages by subject 
was very high, 25.3 pauses per vertical line and 38.5 pauses 
per horizontal line. These figures were due to the records of 
a subject who was apparently so set upon preparation for 
questioning that his attitude was one of studying and trying 
to remember the contents rather than merely reading as 
ordinarily practised. As a result, he gave by far the most 
accurate and thorough answers, and was easily the best in the 
test of comprehension and retention. But he was no less 
certainly the slowest reader of the thirteen subjects. In fact 
he read so slowly that usually the photographic film ran out 
before he could finish the selection. It is interesting to 
recall his performance in the preliminary experiments, when 
the subjects were left to read at their own inclination as to 
speed and comprehension and when questions were asked 
without previous instructions. Already, this subject stood 
relatively high in comprehension and low in speed, but he was 
then by no means at either extreme. Thus the more specific 
instructions here introduced rather accentuated the individual 
idiosyncrasies than favored a uniform attitude! 

The variability of pause-frequencies per line is shown in 
part B of the Table. It is generally conceded that any 
performance at the maximal efficiency will have a minimum of 
fluctuation; and it has already been found that rapid reading 








ws OOSlCi i‘ rti‘(‘iz‘C 





EYE MOVEMENTS IN READING 163 


is associated with a regularity in the number of pauses per 
line.* The Table shows that in general the pause-frequencies 
varied by a standard deviation of 2.73 for vertical reading 
and 3.79 for reading in the horizontal axis. It is interesting 
to note that these figures, the means of the standard devia- 
tions, are clearly less than the standard deviations of the 
means (3.32 for vertical and 5.86 for horizontal reading). 
In other words, the reading of different selections by different 
individuals, even when the average pause-frequencies alone 
are considered, is more variable than the reading of different 
lines in the same selection by the same individual. This 
means that the various readings corresponding to the indi- 
vidual distributions that enter into our results cannot be 
considered “‘samples”’ in the sense that the successive readings 
of different lines are “‘samples”’ of the same type of per- 
formance. 


Too much significance must not be assigned to the exact measures of skewness 
and kurtosis as given in parts C and D of Table I, as the third and fourth moments 
are here very unreliable. The direction of skewness, however, shows a rather constant 
tendency: the mean §;,’s are both positive according to our proposed bifurcation of 
interpretation, and the original data show that of the 111 distributions of pause- 
frequencies only 21 have a negative third moment. This is obviously to be explained 
by the possible range of pause-frequencies having a lower limit (at zero) but no upper 
limit, and by what may be called a motive of economy on the part of the subject toward 
making only necessary pauses. 


The separate entries for vertical and horizontal reading in 
Table I naturally suggest a comparison between the two 
styles of alignment. A direct comparison here is hazardous, 
however, since comparability could legitimately be assumed 
only for the pairs of reading selections from the same source 
for each individual subject, while the illegible records of 
course did not obtain in such systematic pairs. We must 
therefore refer to the individual pairs for strict comparison, 
and unpaired single records must be disregarded. The results 
of such a comparison are summarized in Table II, obtained 
from 43 pairs of pause-frequency distributions. Differences 
were calculated for each of the four statistical constants, and 
in the Table are presented (1) the ratio of positive to negative 


‘First noted by W. F. Dearborn, The psychology of reading, Arch. Phi!. Prychol. 
etc., 1906, no. 4. 








164 EUGENE SHEN 


differences, (2) the mean of the differences, (3) the standard 
deviation of the differences, (4) the standard error of the 
mean difference, (5) the mean difference in terms of its 
standard error, and (6) the chances of the mean difference 
being positive. A difference is here expressed as positive 
when the value for vertical reading is greater than that for 
horizontal reading. 

Table II shows that, on the average, it required 2.4 more 
pauses to read a horizontal line than a comparable one in 
the vertical. While of 43 cases there were as many as eight 
pairs where horizontal reading required fewer pauses than 
vertical reading, the difference is certainly very significant, 
if we consider the mean rather than the individual values. 
The mean difference, as shown in Table II, is more than 5 
times its standard error, so that the chances of its being 
positive, or the chances of vertical reading on the average 
requiring more pauses than horizontal reading, are as little 
as 4 1n a hundred million. 


TABLE II 


DIFFERENCES BETWEEN 43 PAIRS OF PAUSE-FREQUENCY DISTRIBUTIONS 
(VERTICAL LESS HORIZONTAL) 








Average Variability Skewness Kurtosis 
M o Bi Bo 

Ratio + to — Diffs........ 8:34 19 : 23 30: 13 27:15 
Mean of Differences. ...... —2.42 — .426 39 44 
S.D. of Differences®....... 2.96 1.20 1.11 1.60 
S.E. of Mean Diff.......... 45 .18 17 24 
neenn Den.JG.E............ —5.38 — 2.37 2.29 1.83 
Chances Mean Diff. +.....] .oo000004 .0089 .989 .966 




















5’ The standard deviation here was directly calculated from the distribution of 
the differences. To derive it from the standard deviation of the two distributions, 
the correct formula for general use is: 


og = 0)? + a7? — 27120102. 


Unless measures in the two distributions are definitely known to be uncorrelated, 
omission of the last term, though too often practised, is never justifiable. A high 
correlation would considerably reduce the variability and increase the significance of 
the difference. If two groups of sticks are to be matched for length, a small average 
difference between the groups matters little if the matching is done at random, for 
correlation is absent and the variability of the difference will be large. But if the 





EYE MOVEMENTS IN READING 165 


Since central tendency and variability are positively corre- 
lated in skew distributions of a positive third moment, one 
naturally expects to find the difference in variability tending 
in the same direction as the difference in average pause- 
frequency. ‘Table II shows that the pause-frequency distri- 
butions for horizontal reading are more variable than for 
vertical reading. The difference is less marked than the 
difference in the number of pauses. Even then, the chances 
of the mean difference reversing its sign are less than one ina 
hundred. The differences in skewness and kurtosis, it will be 
noted, are positive instead of negative. That is to say, 
while the pause-frequency distributions have a higher average 
and greater variability for horizontal than for vertical reading, 
they are more skew and more leptokurtic for vertical than for 
horizontal reading. The interpretation of these features we 
shall postpone till a later section, when the temporal aspect 
of the reading pauses and other characteristic differences 
between reading in the vertical and in the horizontal will 
have been considered. 

The analysis made thus far of the number of pauses treats 
the various lines in a passage without any discrimination. 
But it is reasonable to expect that besides the variation in 
content and expression, which has to be considered in indi- 
vidual selections and for which a general treatment here is 
impossible, the first line of a passage should require an initial 
adjustment on the part of the organism, observable by a 
larger number of pauses in its reading than on the average. 
Accordingly the differences in the number of pauses have 
been calculated for the reading of each passage comparing the 
first line and the mean of all lines (the first included). These 
differences are summarized in Table III, which gives the 
mean difference, the standard deviation of the differences, the 
standard error of the mean difference, and the mean difference 
in terms of its standard error, for vertical and horizontal 





matching goes on in a regular order, such that the longest in one group matches the 
longest in the other, etc., a very slight average difference may then mean that every 
stick of one group would be longer than the corresponding one of the other, for then 
there exists a perfect correlation which would greatly reduce the variability and 


increase the significance of the difference. 


12 











166 EUGENE SHEN 


reading both separately and together. The difference used is 
the number of pauses for the first line minus that for the 
average. 











Taste III 
PAUSE-FREQUENCY DIFFERENCES BETWEEN FIRST LINE AND AVERAGE 
S.E. 
Alignment Axis Mean Diff. S.D. Diff. Mean Mean/S.E. 

Diff 
ee 1.9 2.1 .29 6.6 
ee 1.5 4.2 .56 2.7 
| Se ee ee 7 3.4 32 5-3 

















In general the first line of a passage required nearly two 
more pauses than the average for vertical reading and 1.5 
more pauses for horizontal reading. ‘The standard deviation 
of the differences for horizontal reading is fully two times 
that for vertical reading. Consequently the standard error 
of the mean difference for horizontal reading is also larger, 
and the significance of the difference, as expressed by the 
mean difference in terms of its standard error, is less certain. 
The chances of obtaining a negative mean difference are less 
than four in a thousand for horizontal reading, and less than 
one in a billion for vertical reading. We can therefore 
conclude, in the first place, that the number of pauses is 
considerably reduced when the reader has completed the 
first line and got into the swing of the meaning. In the 
second place, we see that, either due to physiological differ- 
ences in the mechanism of eye-movement or due to differences 
in the amount of practice, the reduction is much more marked 
in vertical than in horizontal reading. 


THE LocALIZATION OF PAUSES ON THE READING TEXT 


When the photographic film ran at right angles to the 
reading line, the relative magnitude of the successive eye- 
movements along the line was represented by the amount of 
displacement on the record. By making use of this fact, 
the approximate positions of the reading pauses could be 
found by means of an ordinary projection-lantern. The film 








EYE MOVEMENTS IN READING 167 


could be pulled through a special frame on the lantern so 
that any portion might be projected as desired. On the wall 
opposite the lantern hung a wooden frame in which a board 
carrying the reading text could slide horizontally for a 
considerable distance. After adjusting the position of the 
lantern so that the ends of the lines in the copy coincided 
with the corresponding preliminary fixation-points in the 
record, the positions in each line of the copy where the pauses 
in the record fell were determined. 

The localization of reading pauses by such a method seems to have been extensively 
used by various investigators. But its usefulness and its reliability are fairly limited. 
As Dodge ® has pointed out, it is more correct to speak of a field than of a point of 
fixation; and the exact position of a reading fixation has little significance. It must 
be admitted, moreover, that the photographic record obtained by the method of 
corneal reflection is not absolutely accurate in showing the magnitude of movement 
at various angles, as it is in registering the number and duration of the pauses. When 
head-movements occur, though their distances are recorded by the reflection from 
the metal bead on the spectacle frame, it is impossible to make simple corrections 
which will insure an accurate placing of the fixations. We therefore used only those 
records where head-movements were not detectable; and even then the location was 
taken only to indicate the particular word, not any of its individual parts, upon which 
the fixation fell. 

An attempt was made at correlating the fixation pauses 
with the language units in the text, but it soon proved 
fruitless. While it was satisfying occasionally to find, for 
instance, a two-word term or phrase read with a single 
fixation between the two words, more often there was an 
utter lack of identifiable correspondence. Other generaliza- 
tions of the results here are also difficult. It can safely be 
stated, however, that the first word of each line was usually 
fixated while the last fixation more often than not fell short 
of the last word. ‘The deviation was sometimes as much as 
three words or more from the end of a line. This point is 
particularly noteworthy because the last pause in a line 
generally had a duration somewhat shorter than the average, 
as we Shall presently see. It is also significant in view of the 
finding of Crosland ? that the proofreader tends to overlook 


6 R. Dodge, An experimental study of visual fixation, Psychol. Rev. Monog. Suppl., 
1907, NO. 35. 

7H. R. Crosland, An investigation of proof-readers’ illusions, Unio. of Oregon 
Publ., 1924, Vol. 2, no. 6. See p. 135. 











168 EUGENE SHEN 


mistakes especially near the end of a line. Comprehension 
would probably not be as much affected as the perception 
of the detailed optical features, since meaning is grasped in 
larger units and is continuous with what has preceded, and 
since the time consumed in shifting the eye to the next line 
may provide for further processes of understanding. 

From the average number of pauses per line presented in 
Table I, the average inter-fixation distance within a line 
(all lines were 6§ inches in length) can be estimated. The 
placing of the pauses upon the reading-text in addition shows 
the range of the inter-fixation distances. In Table IV the 
maximum and minimum as well as the average distance are 
given in terms of the length in inches as measured on the 
text, of the number of spaces for words or punctuations, and 
also of the visual angle subtended. In order to facilitate 
cross references, the last two columns of the table also give 
the length of the reading line and the space for a word or 
punctuation in the same terms. 











TaB_Le IV 
MAGNITUDE OF INTER-FIXATION DISTANCE 
Inter-fixation Distance ¢ f 
Kinds of Distance Line — 
Word 
Average Max. Min. 

Length on text in inches...... } 1} ys 6} Vs 

Number of words. ........... 2 6 } 32 I 
HN GUID. oon cece sc eceess re _ 19’ 20° 37.5" 




















The figures in terms of the visual angle subtended are only 
approximate estimates, as the reading distance somewhat 
varied with the different subjects. The angles should be 
slightly larger for myopes. Even with ample allowance for 
underestimation, the angular magnitudes are remarkably 
small, much smaller than in the reading of ordinary English. 
The inter-fixation movements in reading English, according 
to the estimates of Dodge and Cline,* are between 2 and 7 
degrees. 


8 R. Dodge & T. S. Cline, The angular velocity of eye movements, Psychol. Rev., 
1901, 8, 145-157. 





EYE MOVEMENTS IN READING 169 


THE DURATION OF THE FIXATION-PAUSE 


In photographing the eye-movements during reading, 
interruptions of the light illuminating the eye enabled the 
records to show the duration of each fixation-pause. The 
unit of time used was one-fiftieth of a second. While this 
measured the duration of the pause with all the precision 
that could be desired, it was much too long an interval to 
record with accuracy the time of the inter-fixation movements 
within a line. Dodge and Cline, who placed the angular 
magnitude of inter-fixation movement between 2 and 7 
degrees, gave 22.90 as the average time required, with ex- 
tremes at 14.2 and 36.50. Since the movements in our case 
covered a shorter distance, the time must also be shorter. 
From the velocities of longer horizontal movements, according 
to Dodge and Cline, 


28.80 for 5° 54.80 for 20° 
38.80 “ 10° 80.40 ‘* 30° 
48.20 “ 15° 99.90 “* 40° 


and with the assumption that the time approaches zero as the 
magnitude of movement approaches zero, interpolation by 
Lagrange’s Theorem yields the following estimates for move- 
ments of less than 5 degrees: 


9.go for 1° 
wae" + 
22.20 “* 3° 
26.00 “ 4° 


From these figures it will not be far wrong to place the 
time for the average inter-fixation movement (1° 15’) between 
10 and 15¢. The long sweeping movements from the end 
of a line to the beginning of the next, covering 20 degrees or 
slightly less, would take about 540. ‘This was verifiable in 
our records, which showed two or three interruptions in the 
photographed light representing the movement. While it 
has been found that movement in the vertical is slightly 
slower, the above figures are hardly precise enough to need 
any qualification. 








170 EUGENE SHEN 


The time for movements is thus much shorter than for the 
pauses (in any case not more than § per cent. of the total 
time) and is therefore relatively of little significance. Much 
more important is the consideration of the fixation-pauses. 
Since the eye in reading sees effectually only when it stops, 
the movements are significant in so far as the eye is on its 
way to the next stopping place. 

In treating the duration of the reading pauses we shall 
follow the same plan adopted in our consideration of the 


TaBLe V 


SUMMARY OF STATISTICAL CONSTANTS OF III PAUSE-DURATION DISTRIBUTIONS 





A. Average pause-duration in hundredth-seconds (M) 










































































Mean of S.D. of Range of Range of 
Whole Whole Averages by Averages by 
Group Group Subject Material 
Vertical...... 30.5 3.27 23-35 27-34 
Horizontal.... 29.4 3.12 21-33 27-31 
B. Variability of pause-duration in hundredth-seconds (o) 
Mean of S.D. of Range of Range of 
Whole Whole Averages by Averages by 
Group Group Subject Material 
Vertical... ..+. 12.6 3.34 9-19 10-15 
Horizontal... 11.0 1.62 8-12 g-12 
C. Skewness of pause-duration distributions (8;) 
Mean of S.D. of Range of Range of 
Whole Whole Averages by Averages by 
Group Group Subject Material 
Vertical...... 3.06 2.52 8-5.6 1.375.4 
Horizontal.... 1.45 .98 .7-3.5 .8-2.0 
D. Kurtosis of pause-duration distributions (82) 
Mean of S.D. of Range of Range of 
Whole Whole Averages by Averages by 
Group Group Subject Material 
Vertical...... 8.25 §.31 3-9-11.7 4-7-12.1 
Horizontal.... 5-64 2.47 3.2-10.4 4.0- 7.8 




















EYE MOVEMENTS IN READING 171 


number of pauses per line. We here have 55 pause-duration 
distributions for vertical reading and 56 for horizontal 
reading. The four statistical constants for these 111 distri- 
butions are summarized in Table V. 

The average pause-duration, as shown in part 4 of Table 
V, was roughly .3 sec. Variation of the individual readings 
was relatively small, yielding a standard deviation of about 
a tenth of the average. In the case of the pause-frequency 
per line, if we refer back to Table I, the standard deviation 
of the individual averages was more than a fifth of their 
mean value. This suggests that while the pause-frequency is 
more subject to the influence of the individual’s ability and 
experience and of the difficulty of material, the pause- 
duration is largely a function of certain physiological con- 
ditions which are relatively free from the influence of meaning 
in the material. This contrast undoubtedly holds to a certain 
extent, but it would be committing a grave error if one should 
go on to conclude that the average pause-duration is entirely 
independent of the subject and the material. Since the 
average of the standard deviations (Column 1 of part B in 
Table V) is about .12 second, and since the number of pause- 
durations in each distribution is on the average greater than 
144, the standard error of the mean pause-duration of a 
distribution according to the formula 





will not exceed .o1 second. But the means actually obtained 
from the different distributions varied with a standard 
deviation greater than .03 second, at least three times what 
would obtain if the different distributions were ‘‘samples”’ 
of the same source of data. 

From parts C and D of Table V it is seen that the pause- 
duration distributions were both skew and leptokurtic. The 
mean f;’s were 1.2 times the standard deviation above zero, 
and the mean f,’s were about one standard deviation above 3. 
Of the entire 111 distributions, none had a negative third 
moment and only 8 had a f below 3. As both skewness and 











172 EUGENE SHEN 


leptokurtosis are indications of instability of the distribution, 
it is not unreasonable to suspect the pause-duration distri- 
bution of some infinite characteristic.2 One would then 
expect occasionally to find a pause of infinite or infinitesimal 
duration. Such a pause-duration is not at all inconceivable 
if one takes into consideration the fact that the saccadic 
eye-movements are but a convenient class and that they are 
theoretically difficult to distinguish from gliding movements 
of a continuous nature, which occasionally manifest them- 
selves in reading as unsteady fixations. 

A comparison of the last two columns throughout Table V 
shows that the range of the averages by subjects is in every 
case greater than the range of the averages by materials, 
again exhibiting the predominance of the individual subject 
over the reading material as a determining factor of the 
results. The deviation of the subject who had the shortest 
pause-duration (.23 and .21 second) was rather extreme. 
Reference to his standing in pause-frequency revealed neither 
correlation nor compensation. This subject was designated 
P in our preliminary report, where also his reading pauses 
were the shortest of the group. Of that subject, whom we 
have mentioned as having the best comprehension score as 
well as the greatest number of pauses, the pause-duration was 
about average in horizontal reading and well above the 
average in vertical reading. 

For a comparison of the pause-duration distributions 
between vertical and horizontal reading, differences obtained 
from 40 pairs of comparable records are summarized in Table 
VI. All the four constants are somewhat larger for vertical — 
reading. 

The pause-duration was, on the average, about .o1 second 
longer in vertical reading, showing an opposite tendency to 
differences in pause-frequency. But the differences here 
were less marked, considering their places in the speed of 
reading. In fact, the mean difference in pause-duration in 
favor of horizontal reading was more than counter-balanced 


°Cf. T. L. Kelley’s discussion of instable distributions in his Statistical Method, 
pp. 146 tf. 








EYE MOVEMENTS IN READING 17 


ee 


TaBLe VI 


DIFFERENCES BETWEEN 40 PAIRS OF PAUSE-DURATION DISTRIBUTIONS 
(VERTICAL LESS HORIZONTAL) 








Average Variability Skewness K urtosis 
M 10 g 10 B, Bs 

Ratio + to — Diffs....... 27:10 24:10 28:12 26:13 
Mean of Differences....... 1.15 1.825 1.84 2.69 
S.D. of Differences... .... 3-03 3.27 3.16 6.43 
S.E. of Mean Diff......... .48 $2 50 1.02 
Mean Dr8./5.E........... 2.40 3.51 3.68 2.64 
Chances Mean Diff. +..... .gg18 .99978 9999 9959 

















by the pause-frequency difference in the opposite direction. 
If we take 15 pauses per line and .3 sec per pause as the 
approximate averages, a difference Of .o115 sec per pause 
amounts to less than .2 sec per line, whereas a difference 
of 2.4 pauses per line means more than .7 second. Therefore, 
although vertical reading had longer pauses than horizontal 
reading, it was none the less more rapid, due to its fewer 
pauses. 

As in the case of pause-frequencies, here also the differences 
in variability followed the same trend as the differences of 
averages. But more interesting are perhaps the differences 
in the skewness and kurtosis of the distributions. It is a 
very significant fact that the pause-duration distributions 
should show the same tendency as the pause-frequency 
distributions of the vertical toward a more marked skewness 
and leptokurtosis than the horizontal, especially since the 
differences in central tendency and variability show opposite 
trends in the two types of distributions. Since we have 
already noted that the reading records show an excess of 
unsteady fixations and gliding movements in vertical reading, 
and that a marked tendency to skewness and leptokurtosis 
is an indication of instability of the distribution, one can 
hardly escape the conclusion that the results of these two 
different observations come from some fundamental fact and 
are connected with the mechanism of eye-movements. ‘To 
the physiological conditions of this difference as well as to its 


19 In units of .ol-sec. 











174 EUGENE SHEN 


bearing upon reading efficiency we shall return later in this 
paper. 

Having considered the general characteristics of the distri- 
butions of pause-durations, we shall now turn to the duration 
of certain classes of pauses. The first and last pauses of a 
line and pauses immediately preceding and following a 
regressive movement invite a special examination. Pauses in 
relation to the regressive movement will be treated under 
that caption in the next section; we shall here consider the 
first and last pauses in the line. 

The first pause in a line is significant because it occurs 
after a long movement,!! and represents a more radical 
adjustment than the pauses succeeding it. The first pause 
is very often immediately followed by a regressive fixation, 
in which case it fulfills a less important function than other- 
wise, and this distinction should be made. The duration of 
the last pause is of interest particularly because of its marked 
inward deviation from the end of the line. The pause- 
durations of these three types are summarized in Table VII, 
which gives in each case the mean of the means, the standard 
deviation of the means, the mean of the standard deviations 
and the standard deviation of the standard deviations. As 
the variation of the same individual in reading various 
selections was relatively slight with respect to pause-duration, 
our calculations were made for each subject without reference 
to reading material. For purposes of comparison the corre- 
sponding figures for all pause-durations, already presented in 
Table V, are also given in Table VII. 

First pauses immediately followed by a regressive move- 
ment had an average duration of only .19 second, while those 
not so followed were on the whole as long as .32 second. 
The former thus amounted to only 60 per cent. of the latter, 
and were fully a tenth-second shorter than the general average. 
This value of .19 second seems to be in the order of the 
reaction-time of the eye, during which the eye (or the organism 
through the eye) discovers its position and is prepared for a 


This also holds for the first pause in the first line, since the subject was required 
to fixate a dot beside the end of the line before reading commenced. 





EYE MOVEMENTS IN READING 175 


Tas_ie VII 


DuRATION OF FIRST AND LAST PAUSES OF A LINE 
(IN HUNDREDTH-SECONDS) 























Pauses 
First Pause in 
Values Last General 
Computed Pause 
Followed by Not Followed Ver- | Hori- 
Re-fixation by Re-fixation tical |zontal 
Mean of means....... 19.4 32.3 27.8 30.5 | 29.4 
S.D. of means......... 1.9 4-5 5.2 3.3 3.1 
Mean of S.D."s........ 8.6 10.9 12.6 12.6 | 11.0 
8 es ee 4.3 2.7 5.5 3.3 1.6 











readjustment.!? First pauses that were not immediately 
followed by regressive fixations were more variable in duration 
and their difference from the general average was less marked. 
It is not unjustifiable to say, with Dearborn, that the first 
pause in a line was longer than the average, except when 
immediately followed by a regressive movement. ‘The dura- 
tion of the last pause in a line was still more variable, but on 
the whole it was as much shorter than the general average 
as the first pause (not immediately preceding a re-fixation) 
was longer. 

An analysis of the speed of reading in terms of the number 
of words read per second will not be separately presented. 
But it can be readily estimated from the number of pauses 
and their duration, already fully considered. In general, 
the average was from 5 to 7 words per second; a rate slower 


than 3 words per second or more rapid than 10 words per 
second was very rare. 


THE REGRESSIVE FIXATION 


The occurrence of regressive movements, or movements in 
a direction contrary to the general progress of the reading, 
is undoubtedly a very noteworthy feature. It is an example 
of motor adjustment largely conditioned by the comprehension 
12 See R. Dodge & F. G. Benedict, Psychological effects of alcohol; and W. R. 


Miles, Alcohol and human efficiency. Carnegie Instit. of Washington Pubdl., 1915, 
NO. 232; 1924, no. 333. 











176 EUGENE SHEN 


of the material. In all probability, when fatigue is dominant 
and interest lacking, 1.e., when a reader’s eyes are going over 
a printed page with little understanding of the content, the 
pauses are apt to be very regular and mechanical, and re- 
gressive movements may be entirely absent. At any rate, it 
is only reasonable to assume that the frequency of regressive 
fixations is not only indicative of the difficulty of reading 
material, but is as well proportional to the carefulness of the 
reader relative to his ability. | 

We have already seen in considering the duration of the 
first pause in a line that when immediately followed by a 
regressive fixation it averaged less than .2 second long and 
when not so followed its average was over .3 second. Pauses 
other than the first in a line which were followed by regressive 
movements, however, did not seem to be very different from 
the general distribution; neither did the average duration of 
the regressive fixations themselves, whether immediately 
following the first pause or occurring elsewhere in the line, 
show any remarkable deviation. ‘The exact figures are pre- 


sented in Table VIII. 
TaB_eE VIII 


AVERAGE TIME OF IRREGULAR PAUSES 
(IN HUNDREDTH-SECONDS) 








Alignment Mid-line Pauses Re-Fixations Mid-line 
Axis before Re-Fix’n after 1st Pause Re-Fixations 
Peers 28.4 31.2 29.4 
peosmomtal. ......... 28.3 29.4 27.8 














Of much more interest and importance is a consideration 
of the frequency of regressive fixations. Previous investi- 
gators have noted, and our preliminary experiments also 
showed, that regressive movements are particularly frequent 
after the first pause in a line. In the present case we found 
that the first pauses which were followed by regressive 
movements ranged from 0 to 75 per cent. for the different 
subjects. The average was 44.5 per cent. for vertical and 
50.7 per cent. for horizontal reading. This rather high 
frequency was largely due to the relatively long line (6§ 











EYE MOVEMENTS IN READING 177 


inches) of the printed material, and showed the tendency for 
the eye-movement to be slightly inadequate in its first 
adjustment. The frequency of regressive fixations not im- 
mediately following the first pause in a line was found to be, 
on the average, 1.16 per line for vertical reading and 2.51 per 
line for horizontal reading. If we include both types in a 
single statement, 1.¢., co nt all the re-fixations, simple 
addition gives us 1.61 and 3.02, for vertical and horizontal 
reading respectively, as the average number per line. Since 
the average number of all pauses per line was 15.0 for the 
vertical and 18.3 for the horizontal (Table I), we can state 
the frequency of regressive fixations in terms of their pro- 
portion to the number of all pauses. We find that on the 
average there is one re-fixation out of every Q pauses in 
vertical reading, and one out of every 6 in horizontal reading. 
These averages must of course be taken with great caution: 
allowance must be made not only for variations among 
readers and among selections of material, but also for irregu- 
larities at different places within the same selection read by 
the same subject. 

Variability is best appreciated by citing extreme cases at the limits of the range. 
One of the subjects who had the smallest number of re-fixations on the average made, 
in reading Selection C2 in the vertical, only .10 per line, or one re-fixation in ten lines. 
The other extreme is represented by a subject who in reading Selection At in the 
horizontal made almost 10 re-fixations per line, and at one place in the record there 
were as many as four regressive movements in succession, t.¢., without forward move- 
ments intervening. This was the slowest subject, to whom reference has been made 
repeatedly, and he made more than 40 pauses per line in reading that particular 
selection. 

For a comparison between vertical and horizontal reading, 
differences were calculated from 43 pairs of comparable 
records, with respect to the frequency of re-fixation per line. 
Of the 43 cases, 38 showed more re-fixations in horizontal and 
only § showed more re-fixations in vertical reading. They 
numerically distribute about a mean of .974 (horizontal less 
vertical) with a standard error of .147. The mean difference 
is thus 6.6 times the magnitude of its standard error, and 
according to the normal probability integral the chances of 
obtaining a mean difference in favor of horizontal reading 








178 EUGENE SHEN 


are less than one in a billion. This is the most significant 
difference found between vertical and horizontal reading. 

In considering the average number of all pauses, we found 
(see Table II) that the mean difference between vertical and 
horizontal reading was 2.4 pauses per line. We now see that 
more than a third of this difference is in the number of re- 
fixations. Since we have just obtained the ratio of re- 
fixations to all fixations as between a ninth and a sixth, this 
difference of re-fixations is strikingly out of proportion, and 
should not be passed over as a subordinate fact fully implied 
in the differences of pause-frequency in general. Rather we 
must assume distinct additional factors at work. If regressive 
fixations are corrective of movements of a too-long span, it is 
not unreasonable to suppose that the eye in horizontal reading 
more than in vertical reading tends to move in a greater 
magnitude than desirable. We have already noted from the 
photographic records that horizontal movements are more 
regular and are very distinct from fixations, while vertical 
movements seem to have a tendency to shade over into 
unsteady fixations. We have further suggested, upon the 
evidence of skewness and leptokurtosis in the distributions of 
pause-frequency and pause-duration, that the distinction 
between saccadic movements and movements of a gliding 
nature is less clear-cut in vertical reading. ‘These facts all 
point in the same direction. It is quite possible that the 
more complicated mechanism for vertical eye-movements may 
turn out to be an asset in reading, if it is better able to make 
the very small degrees of movement which the reading of 
Chinese requires. 


Discussion: DIFFERENTIAL FACTORS IN VERTICAL AND 
HorizonTAL READING 


The results of the foregoing analysis show that in the 
reading of Chinese vertical alignment compares favorably 
with horizontal alignment. This difference might of course 
be curtly dismissed by attributing it to the effect of habit. 
That longer practice in vertical reading must have enabled 
the subjects to perceive combinations of words somewhat 
more readily in the vertical than in the horizontal no one will 

















EYE MOVEMENTS IN READING 179 


refuse to admit. As an explanation, however, it could just 
as well have been conceived and offered before we made our 
series of observations upon eye movements, the aim of which 
is precisely a more searching analysis of the conditioning 
factors. Therefore, while realizing full well that habit or 
experience is at least responsible in part, and possibly in toto, 
we shall try to see if there is not something besides habit 
that may partly be accountable for the difference. In other 
words, though habit is a necessary and may even be a sufficient 
explanation, it is certainly neither exclusive nor exhaustive. 

In view, moreover, of the current movement in China to 
change from vertical into horizontal alignment, and of the 
readiness with which the enthusiastic seize every physiological 
fact concerning the eye in support of the “reform,” it is 
especially desirable to weigh the possibilities without bias and 
give the other side of the question a fair consideration. 

To the naive observer, the most striking fact that may 
give an advantage to horizontal over vertical reading is the 
relative position of the two eyes and the manner in which 
each eye opens. It is true that the eyes are thus permitted 
more free and extensive movements in the horizontal than 
in the vertical axis. But the actual range of vision required 
in reading is extremely limited, so that when a printed page 
is held in front of the eyes the more extended visual field in 
the lateral directions is of no service except in looking away 
from the reading text. 

An apparently more significant factor in reading is the 
retinal field of clear vision excluding eye movements. The 
study of Ruediger, whose subjects included a few individuals 
from Japan and China, has demonstrated that: “the shape 
of the field bounded by points of equal distinctness varies in 
different individuals from a ‘square-oval,’ about twice as long 
horizontally as wide vertically, to a circle.” But the same 
author found no correlation between the dimensions of the 
field and reading ability. Since the «pan of a pause in the 
reading of Chinese covers a specially short distance, we can 
the more safely conclude that the retinal field of fairly distinct 


13W. C. Ruediger, The field of distinct vision, Arch. of Phil., Psychol., etc., 1907, 
no. 5. 

















180 EUGENE SHEN 






























vision, if its horizontal axis is longer than the vertical, will 
not necessarily make horizontal alignment the easier reading. 
For similar reasons, the greater speed of simple long move- 
ments in the horizontal does not necessarily save time in 
reading, just as a high-speed car would have little superiority 
for the purpose of sight-seeing in crowded streets. 

A relative merit which may seriously be claimed for 
horizontal reading is perhaps the fact that it does not involve 
movements of the eyelid so much as in vertical reading. 
It is at first thought rather apparent that lid-movements 
probably waste energy and possibly interfere with reading. 
But the actual effect upon reading no one yet knows. More- 
over, it can be said for the other side of the question that lid- 
movements accompanying vertical reading have the function 
of moistening the eye-ball and probably save some winking 
reflexes which would otherwise have been necessary. 

Against horizontal reading is to be mentioned the fact 
that it requires constant readjustment in convergence and in 
the relative accommodation of the twoeyes. As convergence 
is a relatively slow and difficult adjustment, and as accommo- 
dating two eyes independently of each other or in opposition 
to each other is contrary to the usual practice of binocular co- 
ordination, the fact deserves serious consideration. ‘Though 
its practical significance in reading is not yet definitely known, 
it is certainly one of the possible factors independent of habit 
which may in part account for the better results of vertical 
reading. 

The various characteristics of the movements in vertical 
and horizontal reading as shown in the photographic records 
are undoubtedly symptoms of some underlying differential 
factors. The shorter distance covered by the cornea in 
vertical movement is most likely due to a slight forward 
shifting of the center of rotation of the eye ball,” in turn 

4 For illustrative records and explanations see this JOURNAL, 1925, 8, 344-362. 


15 Tf we follow Dodge’s calculation of the ratio of the movement of reflection to 
the actual movement of the cornea by: 





(es from apex to center of rotation—radius of curvature of cornea ) 
distance from apex to center of rotation 








EYE MOVEMENTS iN READING 18! 































attributable to the peculiar pull exerted by the oblique 
muscles. The formation of a loop in the return sweep to a 
new vertical line may be accounted for by a temporal disparity 
in the action of the superior rectus and inferior oblique 
muscles. The effect of the difference in the muscular mechan- 
ism upon reading efficiency is not easy to evaluate. The 
suggestion is here ventured that it favors the horizontal in 
long sweeping movements of the saccadic character and makes 
shorter and somewhat gliding movements relatively easier in 
the vertical. 

That long movements are more rapid in the horizontal is a 
demonstrable fact, but its saving in reading is practically 
negligible. On the other hand, the flexibility shown by 
records of fixation in vertical reading, its greater tendency 
toward skewness and leptokurtosis in the distributions of 
pause-frequency and pause-duration, and the remarkably 
fewer number of regressive fixations as compared with hori- 
zontal reading, all point to the probability that the more 
complicated mechanism for vertical movement is particularly 
adaptable for shifts of small angles and of a somewhat gliding 
nature. Since Chinese words are very compactly arranged, 
requiring, as we have seen, an average of not less than fifteen 
pauses in a line of about six inches, a greater facility in execut- 
ing movements of small magnitude or movements of a gliding 
nature will make a considerable contribution to reading 
efficiency. Itis this factor, and the differences in convergence 
and relative accommodation which probably contribute to the 
same end, that we should take into consideration as likely to 
be at work behind the differences between vertical and 
horizontal reading. 

Aside from the physiological mechanism of the eye, the 
printed type deserves consideration. While all characters are 
squares and can be arranged in horizontal lines just as well 
as in vertical columns, the structure of the characters may 
have different bearings upon the two styles of arrangement. 





and assume 13.5 mm and 7.7 mm as the approximate dimensions, and if we take the 
distance on the photographic record covered by a vertical movement as .8 of that 
covered by a horizontal movement of an equal angular magnitude, then the forward 


shift of the center of rotation is estimated to be in the neighborhood of 1.2 mm. 
13 














182 EUGENE SHEN 


In the first place, the vertical strokes are always heavier than 
the horizontal. Then, also, the frequency of the strokes in 
the two axes is not even approximately equal. An exami- 
nation of 160 words at random shows that there are about 
twice as many short horizontal as vertical strokes. The 
influence of such differences upon reading efficiency is difficult 
to conjecture. But it is not unreasonable to suppose that 
during its development the printing type has been better 
adapted for vertical reading, since that is the traditional 
practice. 


A group of American students who had no knowledge of the Chinese language 
were requested to compare the equivalent forms of the same passage printed in the 
two styles. Of 51 individuals, 38 preferred the vertical as the better style of arrange- 
ment, 7 preferred the horizontal, and 6 declined to make a judgment. The reasons 
given by those preferring the horizontal such as ‘more natural,’ ‘more easily seen,’ 
‘less strain’ and the like are very vague and indefinite. One subject stated that more 
in the horizontal could be seen at a glance. Among those who preferred the vertical, 
the most extensively given reasons were its general neat appearance and better spacing 
(by 15 subjects) or its being more esthetic and pleasing to the eye (by 10 subjects). 
These two groups might have meant the same thing. Other reasons given were ease 
in change of line, less fatigue, analogy to adding column of figures, or such vague 
statements as ‘more logical.’ 

A number of the writer’s acquaintances who read Chinese in both styles of align- 
ment also have the impression that the horizontal style seems less comfortably spaced 
and is more liable to mistakes in changing to a new line. All this may very well be 
due to the structure of the characters already pointed out. More extended experi- 
ments upon the legibility of the Chinese characters in the two styles of arrangement by 
means of brief tachistoscopic exposures are required to throw further light upon this 
phase of the problem. 


The above discussion of possibilities aside from habit is 
by no means intended for a dogmatic defence of vertical 
reading. The possibilities are’ pointed out merely as a 
warning against indiscriminate appeals to physiological facts 
in support of the ‘reform.’ Practical considerations of 
facility in mathematical writing and foreign quotation may of 
course outweigh other factors and fully justify a complete 
change into horizontal alignment. At any rate, the adoption 
of horizontal print for the Chinese language must be on an 
“all or none”’ basis; the present practice of simultaneous use 
of both styles is very unsatisfactory and confusing. Whether 
vertical reading is easier than horizontal, certain it is that 
either is easier than both. 











EYE MOVEMENTS IN READING 183 


SUMMARY 


The results here reported in general verify and amplify 
our preliminary findings. The most outstanding facts may 
be summarized in the following statements. 

1. The reading of Chinese differs from that of English 
chiefly in the spatial distribution of the pauses. ‘The square 
shape of the Chinese words and their more compact arrange- 
ment in a line demand eye-movements of smaller angles and 
a decidedly greater number of pauses per line than in reading 
English. 

2. The inter-fixation span varies from half-a-word to six 
words. The average is two words. The duration of a 
fixation varies about an average of roughly .3 second. 

3. The first line of a passage is usually read with a greater 
number of pauses than on the average. The first pause in a 
line, unless immediately followed by a regressive movement, 
has a longer duration than the average. ‘The last pause in a 
line generally deviates inward from the last word, and it is 
somewhat shorter in duration than the average. 

4. The pause-duration in horizontal reading is shorter than 
in vertical reading. But this difference is more than counter- 
balanced by the longer inter-fixation span in vertical reading. 
Comparable passages are thus read faster in vertical than in 
horizontal alignment. 

5. Characteristic differences between vertical and _ hori- 
zontal reading suggest certain physiological differences in the 
mechanism of eye-movement. In vertical movement, the 
center of rotation of the eye ball is shifted forward, and there 
is also a temporal disparity between the actions of the superior 
rectus and inferior oblique muscles. There is a tendency of 
the eye in vertical reading to make gliding movements and 
unsteady fixations as opposed to a clear-cut distinction 
between jerks and pauses. 











A FURTHER CONTRIBUTION TO THE TACTUAL 
PERCEPTION OF FORM 


BY M. J. ZIGLER AND REBECCA BARRETT 
Wellesley College 


This investigation was undertaken with the twofold 
purpose of dealing more fully with certain aspects of a study 
recently reported! and of extending our investigations to 
certain new conditions of the tactual perception of form. Our 
experimental work presented three principal problems. (1) 
To determine the influence of completeness and incomplete- 
ness of contour as distinct from total areal impression upon 
the perception, we used, in addition to the solid stimuli of the 
previous work, figures which were partially or completely 
outlined. (2) As regards the place of excitation, we stimu- 
lated on the palm of the hand, on the ball of the thumb, and 
on the volar surface of the forearm, in order to determine 
the influence of differences of sensitivity upon the appre- 
hension of form. And (3), with respect to central processes, 
we hoped to come to closer terms with the problem—already 
suggested and somewhat inconclusively dealt with in the 
reference given—of the immediacy or mediacy of this per- 
ception. If mediate types appeared, we proposed to note 
observable stages in the development of the perception. 


THE EXPERIMENTS 


The experimental conditions were practically the same as 
in the previous work. O was seated in a comfortable chair 
at the side of alowtable. The bared right arm was extended, 
volar side upward, over the edge of the table, which was 
padded for comfort. The parts stimulated were at all times 
concealed from O by a cardboard screen. Through a small 


iM. J. Zigler & K. M. Northup, The tactual perception of form, 4mer. J. Psychol., 
1926, 37, 391-397. 
184 





THE TACTUAL PERCEPTION OF FORM 18 


WV 


notch in the bottom of the screen the hand of O was placed 
in position. 

The stimulus-figures were cut from hard-rubber stock of 
5 mm thickness. We used the same five figures as before; 
square, equilateral triangle, right-angle triangle, diamond and 
hexagon. There were three sets of each figure; one solid 
(as in the previous work), one outlined (the central parts 
chiselled out to leave a narrow rim 2 mm wide along the 
edge), and one partially outlined or pointed. The last were 
constructed by mounting small rounded points of 2 mm 
elevation at the juncture of adjacent sides of the figure. 
There were three points for the triangles, four for the square 
and the diamond, and six for the hexagon. Five sizes were 
provided for every one of the figures in each of the three sets. 
The main dimensions (as given also in the previous paper) 
were 12, 14, 16, 18 and 20 mm. We had found that these 
dimensions range from the lower limit of clear perception of 
form to the region of optimal size for perception (p. 396). 

The stimuli were applied with the previously described 
(p. 391) mechanical applicator, which enabled us to apply 
all figures at roughly the same intensity of pressure, about 
500 gr. 

The figures of all three sets were thrown together and 
arranged in haphazard order. The stimuli were always 
applied to the same area and always in the same position. 
Thus in stimulation of the arm the apex of the triangle was 
always toward the wrist, and the parallel sides of the square 
and the hexagon and the longer axis of the diamond were 
always applied in the longitudinal direction of the arm. The 
same mode of application was used on the palm and the 
thumb. We worked on the volar forearm midway between 
the wrist and the elbow, at the juncture of the first and 
second metacarpals, and on the central part of the ball of the 
thumb. A temporal interval of at least 20 sec separated all 
presentations; and longer resting periods were regularly given 
after 10 or 20 trials. From 50 to75 presentations were made 
at a sitting. 

One of us (Barrett) served as experimenter. The four O’s 











186 M. J. ZIGLER AND REBECCA BARRETT 


were Zigler (Z), a member of the staff; K. Ward (W), a senior 
‘major’ in psychology; and M. Davidson (D) and A. B. 
Hoffman (H), graduate students. W and Z were O’s in the 
previous work; but the others were untrained in tactual 
observation. The general instruction read as follows: 


After the ready signal, a tactual stimulus will be applied to the volar surface of 
the forearm (palm, or ball of thumb). You are to observe carefully and report the 
shape of the tactual impression and all observable criteria upon which your perception 
is based. If you prefer, you may sketch the shape of the impression on paper. 


RESULTS 


i. Adequacy of perception.—As in the previous study, the 
triangles were correctly perceived much more frequently than 
the other figures. They totalled more than two thirds of all 
the correct judgments from all three regions. The square 
was correctly judged more frequently than the diamond by 
all O’s except Z, who found the diamond much easier. This 
is probably due in part to the fact that the square is a more 
common figure than the diamond; so that the O’s working 
without knowledge more readily apprehended the square, 
while the O working with partial knowledge had no central 
disposition favoring any one of the five figures. Z reported 
that the diamond, as a rule, was more clearly given than the 
square; and other O’s, in the few instances when the diamond 
was correctly perceived, testified that the impressions were 
extremely clear and definite. The tendency to perceive the 
diamond as a triangle was very pronounced in the case of 
every O, and is probably the chief explanation of the fewness 
of correct ‘diamond’ judgments. The reverse tendency of 
perceiving the triangle as a diamond was negligible. As in 
the earlier study, the hexagon was rarely reported. Z 
recognized it several times in all three of the regions stimu- 
lated; but of the O’s working wholly without knowledge, it 
was recognized only by W who reported it 5 times upon the 
thumb. This figure usually gave a very indefinite, unclear 
and blurry outline; and the shape most frequently assigned 
by the O’s (other than Z) was the circle. Even Z repeatedly 
testified that this figure aroused the impression of a disc-like 
form. It is evident that the figures whose sides form acute 





THE TACTUAL PERCEPTION OF FORM 187 


angles give more definite tactual clues to physical character 
than do those whose sides form right or obtuse angles. This 
is probably due to the fact that the pressure gradient of acute 
angles is steeper than that of right or obtuse angles. The 
tabular results here are wholly similar to those of the previous 
work. 

li. Posttiveness or definiteness of perception.—For the pur- 
pose of determining more precisely the influence of angle on 
the definiteness of perception, we next added to the instruc- 
tions the demand that the degree of ‘ positiveness, definiteness 
or clearness’ of form be designated in all observations and in 
as many degrees as the O’s could easily discriminate. After a 
sitting or two they were inclined to indicate four degrees; 
very positive assurance, fairly positive assurance (mild un- 
certainty), uncertainty (very little subjective assurance), and 
great indefiniteness (no notion of shape). 

Between 65 and 85 per cent. of all the ‘very positive’ 
judgments fall under the two triangles. For two O’s the 
equilateral had a slightly higher percentage than the right 
angle; for a third, there is no difference; while the fourth O 
gave a somewhat higher percentage with the rectangular 
form. Erroneous judgments were fairly rare with the two 
higher degrees of assurance. With regard to the three other 
forms, there is little consistency among the O’s, except that 
the hexagon is mentioned only four times. 

The most significant fact regarding the three tactual 
areas is that ‘positive’ perceptions are very much rarer than 
uncertain, in all regions. We give here a tabular presentation 
of these orders of positiveness (from a, maximal, to d, 





wore: ———. 

















Arm Hand | Thumb 
Obs. ” —— 

a b c&d a b c&d a b cw 
el aevanen sae 3 5 g2 5 7 88 . 11 81 
W.. 2 8 go 2 12 86 4 15 81 
ee? fe) 4 96 fe) 7 93 2 7 QI 
ee Be en fe) fe) 100 fe) S 95 ° 9 QI 
, ee 5 17 378 7 31 362 14 42 344 



































188 M. J. ZIGLER AND REBECCA BARRETT 


minimal). <A sharp distinction between ¢ and d was at times 
hard to draw. As it seems to have small validity, the two 
degrees are thrown together. 

The table reveals several definite tendencies. (1) Less 
than 1/5 of all the perceptions stand under the first two 
degrees of assurance. (2) The percentages for a are smaller 
than for b. (3) The totals for assurance are highest in the 
thumb region. D and H seldom report high assurance, and 
their correct judgments are much lower than the other two. 

For the purpose of ascertaining whether differences in 
sensitivity influence the accuracy of perception, the figures 
were again presented to the three regions in a new series, 
the O’s observing under the same general instructions as 
before. The percentages of correct judgments, followed by 
the total trials (in parenthesis), are as follows. 








Total 
Obs. No. Arm Hand Thumb 
Trials 
eee 1119 18 (271) 39 (500) 60 (348) 
cccadawacs 1194 9 (271) 22 (575) 44 (348) 
Serer 890 4 (271) 5 (500) 23 (119) 
ae 1060 3 (271) 5 (500) 13 (289) 

















Again the thumb gives maximal adequacy as well as 
maximal assurance and definiteness. The experiences in 
which form was poorly perceived were sometimes charac- 
terized as ‘dull’ and of a dispersive nature; while the pressure 
of a definitely given form was indicated as ‘sharper’ or ‘less 
blunt’ than that of a form indefinitely perceived. 

ili. Influence of type of figure upon perception.—The 
numerical results for our three types are as follows. They 
give, in percentages, the number of correct perceptions 
reported. 

It appears that the outlined figures are slightly more 
easily apprehended in all regions than either of the other 
types. The single exception is D (arm) where the percentages 
for outlined and solid figures are the same. The reports 
indicate that in the clearest perceptions mediated by outlined 
figures the attention is directed solely to the contour of the 








THE TACTUAL PERCEPTION OF FORM 189 























Total 
Obs. Correct Solid Outlined Point 
Perceptions 
re Z 50 38 46 16 
W 26 38 42 20 
D 8 37 37 26 
H 13 31 62 7 
Sieme....... Z 97 30 40 30 
W 81 38 40 22 
D 16 19 44 37 
H 16 37 44 19 
Thumb...... Z 211 32 38 30 
W 156 34 36 30 
D 55 38 60 2 
H 16 32 50 18 





pattern; whereas with solid figures there is a central area of 
pressure which claims attention as well as the contour parts. 
These more central parts often cloud and obscure the total 
perception so that the form is not clear. ‘The point figures 
secure something less than half the percentage of correct 
judgments of the outlined figures; but they stand considerably 
nearer the solids in effectiveness for perception, thus showing 
the importance of the turning points of contour or outline. 

iv. Tactual tied-images.—With the point figures, the O’s 
were frequently unaware of the punctal character of the 
isolated or discrete pressures. Instead they reported com- 
pletely outlined shapes. The isolated points seemed to 
condition a tactual tied-image ? or filled-in perception.’ The 
corners were described as clearer, more intense, more definite 
and bolder than the sides; yet there was at no time a sugges- 
tion that these components of the experience were of a 
substantially different character. It appears that, as in the 
case of visual‘ and auditory ® tied-images, the tactual tied- 
image is of the same qualitative nature as the corresponding 
sensory experience. In many instances, especially on the 


2 Cf. E. B. Titchener, 4 beginner’s psychology, 1915, 75. 

Cf. H. C. Warren, Human psychology, 1920, 262 f. 

4M. J. Zigler, An experimental study of visual form, Amer. J. Psychol., 1920, 
31, 273 ff. 

‘F,. L. Dimmick, An experimental study of auditory tied-images, thid., 1923, 
34, 85-89. 











190 M. J. ZIGLER AND REBECCA BARRETT 


thumb, where the two-point threshold is low, the separation 
of isolated points was noted; and in such cases the tied- 
image failed to connect the discrete sensory elements. The 
perception of shape was not, however, rendered impossible 
here. Visual imagery played a prominent role. Tactual 
completion was more characteristic with the smaller figures in 
the regions of higher threshold; although visualization played 
an accessory part in these cases. ‘There were occasional un- 
developed or incompleted tendencies to tactual completion 
(e.g., of one side, or a part of one side, of a form) even in the 
larger figures; but visual imagery was more characteristically 
resorted to as the mode of completion of the larger forms.® 

v. The temporal course of the perception.—Two of our O’s 
made the comment that the forms were at times not immedi- 
ately perceived; that there was an observable temporal dis- 
junction between an initial phase and the mature perception. 
Accordingly, an the final step of our experimentation, we 
presented the figures on the thumb with the following sentence 
added to the general instructions; ‘‘Report especially upon 
the temporal course of the experience.” H failed in two 
sittings to indicate temporal stages; but the other O’s 
succeeded in most trials. The three stages delineated were: 
(1) an initial or preliminary stage in which O realized a touch 
of indeterminate shape frequently ill-localized and generally 
labelled a ‘‘ pressure blur”’; (2) an intermediate stage in which 
a prominent feature of the boundary of the form caught the 
attention and sometimes suggested a tentative but unsatis- 
factory shape, and (3) the final stage, in which the form was 
definitely perceived with more or less clarity. Frequently 
these stages were partially telescoped. Thus, stages 1 and 2 
were sometimes indiscriminable, the prominent feature of 
stage 2 catching the attention at the moment of stimulation. 
At other times, stages 2 and 3 were telescoped so that there 
was an immediate shift from the uncontoured pressure of 
stage I to the fully developed perception of stage 3. Again, 


®Our method of arousing pressure imagery (in the ‘tied’ form) is superior to 
methods recently described by Braddock (An experimental study of cutaneous imagery, 
thid., 1921, 32, 415-420). Neither of her O’s realize, with positive assurance, the 
existence of true cutaneous imagery. 





THE TACTUAL PERCEPTION OF FORM 19! 


the three stages were at times so completely telescoped that 
the transition from stage I to stage 3 was imperceptible. 
The more characteristic occurrence was a suspended or 
immature perception, in which case the perception did not 
develop beyond stage 2 and remained unclear and indefinite, 
because only one or two pressure highlands forming a part 
of the contour were clearly perceived. 

The following reports are representative of the most 
clearly indicated stages: 

“In the first instant there was only a pressure of indeterminate shape. I knew 
that I had been touched; but the pressure lacked form and was indefinitely localized. 
It was only a fraction of a second, it seemed, when I noticed two angles quite clearly 
and at a considerable separation, and a moment later I observed the outline of the 
wider and intermediate part of the area and perceived a clear diamond form” (Z). 
“There was a large indefinite area of pressure, then quickly I perceived two parallel 
sides, and then the completely outlined square, which was more positively given this 
time than usual” (Z). ‘“‘At first I felt an indefinite point, then I perceived the two 
points, which represented the extremes of the figure, and, as I visualized, passing from 
point to point, the form of the triangle appeared very clearly” (W). ‘“*First I felt an 
indefinite line, then two sides were perceived; but they gave no notion of shape until 
an instant later when a definite triangle was perceived” (W). “A vague pressure 
area or field, then three corners appeared to give it the outline of a boundary, and in 
the next moment I visualized a triangle” (D). ‘“‘A pressure lacking definiteness, 


then a corner caught the attention, and by the aid of visualization I got the relationship 
of the several oiher corners which followed the first one and reported ‘square’”’ (D). 


These stages follow one another with extreme rapidity, 
so that in many instances they are barely observable and 
the various components of the perception are constantly 
changing until the final stage is apprehended. Yet the stages 
are distinctive enough in most instances to justify the claim 
that tactual perceptions of form are not, as a rule, immediate; 
and that we have been able to push behind the perceptual 
pattern to a sensory matrix, which is pre-perceptive, indefinite, 
at times ill-localized, and which represents the nuclear part 
of the experience around which the perception of form 
subsequently develops. From this initial stage the perception 
develops, first by acquiring certain salient features of the 
contour of the figure, and, finally (where the perception fully 
matures), by the apprehension of a definite and complete 
form. We have also indicated the pronounced, ingrained 
disposition to grasp in fleeting succession any topographical 











192 M. J. ZIGLER AND REBECCA BARRETT 


highlands in terms of which the experience may assume shape 
once the sensory matrix, around which a form may be or- 
ganized, has been created. 


SUMMARY 


1. The forms of outlined figures are slightly more clearly 
perceived than those of solid figures; the forms of point 
figures (only corners designated) are less clearly perceived 
than those of solids. 

2. The perceptual form of point figures is, in many cases, 
completed by tactual tied-imagery, which bears very close 
psychological resemblance to the corresponding sensory ex- 
perience; in other cases, the completion is in terms of visual 
imagery. 

3. The perception of these figures is more adequate at the 
thumb than upon the palm of the hand or on the forearm. 
The arm is decidedly inferior to the hand. 

4. Figures having acute angles are correctly perceived 
much more frequently than are those having obtuse or right 
angles, owing, probably, to a steeper pressure gradient. 

5. The positive perception of form, with high assurance as 
to the particular shape, is relatively infrequent under our 
conditions; more infrequent from the hand or arm than from 
the thumb. 

6. When the perception rises gradually, the following 
stages may be distinguished: (a) a preliminary stage of 
shapeless pressure or pressure blur in which the experience is 
unclear and indefinite as to outline and frequently ill-localized; 
(5) an intermediate stage in which one or two salient features 
of the form acquire definiteness and clearness and (c) the 
final stage in which the outline is clearly and completely 
given. These stages may be partially or completely tele- 
scoped, may overlap one another, or may so suffer arrest as to 
suspend perception at a point of immaturity and incom- 
pleteness. 





THe Mirror TACHISTOSCOPE IN THE Dritt LABORATORY 


By Glenn D. Higginson 


Psychological Laboratory, University of Illinois 


The following uses of a modified form! of the Dodge tachistoscope have been 
made in our introductory laboratory course. Two 75-watt daylight lamps are so 
arranged in the tachistoscope as to illuminate the stimulus objects without themselves 
being seen. These lights are independently operated by knife-switches upon an 
adjoining table, while a Zimmermann contact clock, set in circuit with a telegraph 
sounder, is used for temporal control. With this arrangement several of the simpler 
experimental problems of vision have been effectively studied by the beginner in the 
laboratory. 

(a) Using cards of various hues and tints as stimulus objects and a gray of medium 
tint as a common projection ground, a description of positive or negative after-images 
is secured, together with a statement of complementary values in terms of hue and 
tint. A colored stimulus object, supplied with a fixation-mark, is placed in the one 
slide-holder and a plain gray card to serve as a background in the other. The stimulus 
object is exposed for the desired period of time by closing the appropriate circuit. 
The background is then exposed by releasing the one key and closing the other. In 
this way, one exposure can be instantly substituted for another without any significant 
change in the visually apprehended spatial relations. There is neither movement in 
the field nor appreciable change in illumination incident to such a shift; the stimulus 
object simply disappears and in its place the background appears. This background 
is also provided with a marker to aid in maintaining fixation. 

(5) The projection of colored after-images upon backgrounds of unlike hues 
provides an effective method of relating the visual after-image to color-mixtures 
secured through the use of rotating discs. Various resultant hues ranging from low 
to high saturation are obtainable. A decided aid to the student’s understanding of 
the relative significance of the physical and the physiological factors concerned in 
the allied problem of color-mixing is furnished in this way. Since a similar phenomenal 
outcome is secured by the use of these stationary colors in the tachistoscope, the 
student is better able to understand that the ‘mixing’ of colors in connection with 
rotating discs cannot possibly occur on the face of the disc. Moreover, the student 
gains a better understanding of the fact that the physiological processes thrown into 
operation by the action of the stimulus upon the receptor organs are not coterminous 
with stimulation but pass through a gradual decline during which they may be actively 
modified by further changes in the conditions of stimulation. 

(c) The effect of various backgrounds of unlike brightnesses (blacks, whites 
and grays) upon the projected colored after-images is easily demonstrated by our 
apparatus. Here again comparisons are made with those changes in tint or brightness 
which are obtained through the addition of black or white to rotating colored discs. 


1 For a detailed description and drawing of our modified form, see the author’s 
article, The visual apprehension of movement under successive retinal excitations, 
Amer. J. Psychol., 1926, 37, 76-77. For a description of the original form of the 
tachistoscope, see R. Dodge, An improved exposure apparatus, Psychol. Bull., 1907, 
4, 10-13. 


193 








194 GLENN D. HIGGINSON 


(d) The general problem of the temporal and qualitative course of visual adapta- 
tion and its dependence upon continued stimulation lends itself to a study with this 
arrangement. Changing the period of stimulation enables the student to determine 
roughly the effect of the total length of exposition-time upon the qualitative and 
quantitative aspects of visual experience. 

(¢) Through the addition of a small lateral light for fixation, above or below 
the exposure field, a rough comparison can be made between foveal and extra-foveal 
sensitivity to various hues and lights. As the conditions of stimulation permit of 
very brief periods of excitation, the periphery and the fovea may be momentarily 
exposed to small colored squares, circles, triangles, and the like. 

(f) The influence upon visual perception of formal instruction, as well as certain 
other modifying conditions, has been studied with this set-up. An upright cross is 
used as one stimulus object and a similar cross so tilted that its arms bisect the four 
angles of the first is used as a second. The two are given alternately with the in- 
struction to apprehend, or to follow with the eyes, the wheel as it rotates to the right, 
to the left, or half-way around to the right and back to the left. Under such conditions, 
clear visual movement appears. Other stimulus objects, of various sizes, colors, and 
shapes capable of conditioning apparent movement in two or more directions were 
used. The most striking fact for the beginner is that under identical stimulus condi- 
tions a slight change in instruction leads to a sudden and complete change in perception. 

(g) Finally, by inserting an opaque card with a very small opening across the 
path of light the conditions of visual stimulation are reduced to a small colored or 
colorless object upon a lightless field. Under such conditions a simple study of the 
autokinetic movements is possible. While the movements are in course the stimulus 
light is momentarily flashed off to enable O to realize roughly the extent of ocular 
excursion which has occurred during the perception of apparent movement. 

The Dodge tachistoscope has been variously employed in many problems of 
research; but we do not know of its previous use in the general undergraduate labora- 
tory. The fact that this apparatus, when slightly modified, permits of such a wide 
variety of uses as is suggested above appears not to be without value for the beginning 
student of laboratory methods. And to the instructor it furnishes an effective means 


of presenting a number of simple experimental problems in the drill course in laboratory 
methods. 

















THE FIxaTIONAL PAUSE OF THE EYES 


By P. W. Cobb and F. K. Moss 
Lighting Research Laboratory, Nela Park, Cleveland, Ohio 


Bidwell! has commented upon a recently published piece of work of ours? bearing 
the above title. His criticisms fall under three heads: faulty analysis of the experi- 
mental situation, failure to secure certain relevant (introspective) data, and inadequate 
control. 

In regard to the first of these, it is by no means inferable from the presence in, 
or absence from, the paper of a detailed analysis of the present state of knowledge 
bearing on the subject, that an adequate knowledge did or did not exist in the minds 
of the authors during the time the work was planned and executed. We are willing 
to waive this point. The merits of the technique depend only upon what was actually 
done, and we understand that our report has not failed to make this clear. 

The theory of the whole method is this; that the number of spots reported by 
the subject as seen coincides with the number of those which flash out during a period, 
within which the eye-movements are less than sufficient to displace the retinal image 
of any one to the point of excluding it from the group arranged in the arc of a circle. 
The critical degree of displacement is not stated, but is inherent among other things in 
the characteristics and dimensions of the apparatus. Bidwell’s essential criticisms all 
rest upon the question whether the number of flashes reported by the subject does 
actually correspond to the number so exposed in fact. The obvious way to answer 
this question is to gain independent information as to the last-named item. 

The suggestion of control experiments is good until one contemplates the technique 
of providing a limited series of flashes, of known number, which shall all fall within the 
period, indeterminate in advance, during which the eyes are fixated within the meaning 
of the term as implied in the foregoing. Without such rigid synchronization, or with 
stationary eyes, we could not consider such a control to be valid. 

A photographic check is possible and would, no doubt, prove useful; but, un- 
fortunately, we have not been able to carry out this procedure. If the technique of 
our method is new, and has potential value, as Bidwell! says, we can perhaps be pardoned 
for publishing the results and thereby making it possible for someone else to perfect it. 


1D. L. Bidwell, The fixational pause of the eyes, this Journal, 1927, 10, 62-63. 
2P. W. Cobb & F. K. Moss, ibid., 1926, 9, 359-367. 





















ANNOUNCEMENT 


The second annual meeting of the Mid-Western Experi- 
mental Psychologists will convene in Wiebolt Hall on the 
downtown Chicago campus of Northwestern University on 
Friday evening, May 13th, and continue through Saturday. 
Part of the time will be devoted to formal reports from the var- 
ious laboratories and the remainder spent in informal discus- 
sion. All the meetings are open. 


